GIST Entropy improvements #947

fwaibl · 2022-02-16T15:15:06Z

This pull request contains two changes to the GIST entropy calculation:

Correction term for the "reference volume" in the nearest neighbor entropy. This significantly improves convergence of the entropy in bulk, at the cost of a minor slow-down. There is a flag "oldnnvolume" to recover the old behavior.
This implements a new version of the nearest neighbor search. The new version allow the user to choose how many layers of neighboring voxels should be used, using a new option "nnsearchlayers". With nnsearchlayers 1 (default), the results are almost exactly the same. Using a higher value improves bulk convergence of the entropy. Furthermore, the new implementation deals differently with waters that are numerically at the same position (this is very rare).

Timings (Action Post, using 10000 frames of a pure water simulation):

old, already including the previous speedup: 258s
with the volume correction: 267 s
with new NN search (1 layers), and the volume correction: 287 s
with new NN search (5 layers), and the volume correction: 439 s (all values >1 should have similar timing, since voxels are omitted if it can be proven that the nearest neighbor is already found)

I attach a plot of the average entropy contributions calculated from a pure water MD (lower is better, since the entropy is relative to bulk). Note the log scale.

In my opinion, the volume correction term is a significant improvement, since it reduces the bulk entropy a lot, and the slow-down is very small. I am not so sure about the new NN search however, since it it less relevant for the convergence (the log scale makes it look bigger, but in fact most of the error is compensated by the volume correction), and the calculation is always a bit slower. I also don't know exactly why it is slower with "nnsearchlayers 1".

Another question: I added some unit tests in the unitTests folder. To do so, I needed to move the new functionality into a separate header (GistEntropyUtils.h) and cpp file. Is the file naming etc. ok with you? I could easily move everything into Action_GIST.cpp, but then I can't use the unit tests any more.

In GIST, the six-dimensional entropy calculation requires the translational and orientational distance to the nearest neighbor (NN). However, if the translational distance is larger than the 6D distance to the current NN, the orientational distance can be skipped safely. This is a significant speedup (about 2x in my test), since the orientational distance contains an arccos.

The nearest-neighbor density is approximated by finding the distance to the nearest neighbor of each point, computing a sphere volume from this distance, and comparing this to the volume that is expected from a homogeneous density. In GIST, the sphere volume in rotational space and the combined translation+rotation space was approximated by a Taylor expansion. This patch implements the exact volume in rotational space analytically, and uses a tabulated correction factor for distances up to 10 Angstrom to correct the 6-D volume. The result is a better convergence of the bulk entropy, at the cost of about 3.4% calculation time (Action Post) in my test (a pure water box containg 3333 water molecules, using 10000 frames). The bulk entropy (should be zero) is reduced from 2.3e-4 to 5e-5 kcal/mol/A^3 for dTSsix and from 7.7e-4 to 2.3e-4 for dTSorient. The individual voxel values are almost perfectly correlated to the old results (R^2=0.99996).

In the old algorithm, neighbors were searched in the respective voxel and in all of its direct neighbors. The new algorithm lets the user choose how many layers of neighboring voxels should be search. The default is 1, which corresponds to the old behavior. There is a tiny difference in how distances of 0 are treated. In the old code, they were omitted. In the new code, there is a constant (GIST_TINY) which is used if the resulting distance is 0. This leads to numerically different values in about 0.01% of the voxels.

…er-MD-master

AmberJenkins · 2022-02-16T15:26:01Z

The PGI build in Jenkins failed.

fwaibl · 2022-02-16T15:28:37Z

Obviously, the tests are failing because the GIST results are not the same any more. How is this usually handled in cpptraj? Do we need to increase the major (or minor) version number, or do we wait until a new cpptraj version is released?

fwaibl · 2022-02-16T15:31:41Z

If we need to obtain exactly the same results, I can replace the "oldnnvolume" flag for a "newnnvolume" one and make the old behavior the default. Further, I would have to revert the behavior for water molecules at the same position to the old behavior (I am not sure which behavior is actually preferred...)
Should I do that?

drroe · 2022-02-16T18:56:29Z

Obviously, the tests are failing because the GIST results are not the same any more. How is this usually handled in cpptraj? Do we need to increase the major (or minor) version number, or do we wait until a new cpptraj version is released?

So typically the way I have been versioning is revisions are for changes which do not change existing output, minor versions are for when input/output changes, and major versions are for when a significant part of the code (API etc) are altered. In this case, we would probably want a minor version bump (resetting the revision number) since output changes (see description in sec/Version.h).

If we need to obtain exactly the same results, I can replace the "oldnnvolume" flag for a "newnnvolume" one and make the old behavior the default.

Is there any advantage to retaining the old behavior? The new method seems like it is better behaved. However, there is often a benefit to retaining old behavior (for e.g. reproducing old results), particularly if the new results are numerically different. @gosldorf @EricChen521 @tkurtzman what do you think?

@fwaibl if we decide to make this the default, the test results (after being validated) should be updated. If we decide to make it an option, a test should be added for this option. Let me know if this makes sense.

drroe · 2022-02-16T19:57:01Z

@fwaibl also I forgot to mention this, but please run make depend in the src/ directory so that the file dependencies are all updated. Thanks!

AmberJenkins · 2022-02-17T09:41:24Z

The PGI build in Jenkins failed.

drroe · 2022-02-17T13:26:00Z

I'm going to bring your branch up to date. Let me know if it causes any issues.

AmberJenkins · 2022-02-17T13:37:08Z

The PGI build in Jenkins failed.

In the GIST input, the gridcntr and griddim arguments were previously parsed as positional arguments (but if the keyword was present), rather than being parsed behind the keyword. I.e., gist 0.6 1.0 1.6 50 gridcntr griddim 60 70 gridspacn 0.5 out gist.dat refdens 0.0329 was valid, while gist griddim 50 60 70 gridspacn 0.5 out gist.dat refdens 0.0329 gridcntr 0.6 1.0 1.6 was invalid (because the center values had to go first) This commit changes the parsing to the expected bevavior, i.e., the first example is now invalid and the second is valid.

Changes: * after the nearest neighbor (NN) search, introduces a check whether the distance is smaller than sqrt(GIST_HUGE) (i.e., a NN has been found). If not, this distance is omitted from the entropy calculation. * fix a bug where some non-border voxels were omitted instead of voxels on the border. (GIST voxels on the border of the grid are supposed to obtain 0 entropy) * don't look for nearest neighbors if we are on the border. Explanation: In GIST, the entropy is calculated per water molecule based on the NN distance, and binned to a grid. With the recent change of the entropy calculation, I removed some checks that: * omit the entropy of waters if no NN distance is found. Instead, a high value (sqrt(GIST_HUGE)) is assigned to the distance. * omit the NN if the distance to it is numerically close to zero. Instead, a small value (GIST_TINY) is then assigned to the distance. The second first behavior is reasonable (omitting the NN means that an arbitrary higher distance is chosen, and GIST_TINY is probably smaller than that arbitrary distance, i.e., closer to the real one. However, the first one is not so reasonable, because sqrt(GIST_HUGE) is probably far off from the real NN distance. Therefore, I re-introduce the old behavior in this case. This is very relevant for the results of the cpptraj tests, since they use very few frames, leading to many cases of high NN distances.

* This adds 2 tests for the new flags oldnnvolume and nnsearchlayers. The .save files for oldnnvolume (Gist6) are just renamed from the previous Gist2 test. The other test .save files are changed to match the new entropy results. * Increase cpptraj version to 6.3.0

fwaibl · 2022-02-25T12:53:30Z

I updated the tests and also found a bug meanwhile (in the changed entropy - same pull request), where some voxels that are not on the border of the grid were assigned zero entropy values.
I also took the liberty of changing the command line parsing, which was IMO buggy (it was previously assumed that gridcntr is specified before griddim, and the order of the arguments was independent of the keywords - see the commit message for examples.)

drroe · 2022-02-25T13:51:10Z

src/Action_GIST.cpp

+    nx = dimArgs.getNextInteger(-1);
+    ny = dimArgs.getNextInteger(-1);
+    nz = dimArgs.getNextInteger(-1);
+  }
  griddim_ = Vec3((double)nx, (double)ny, (double)nz);


Should exit with an error here if negative dimension values have been specified. I should have done this in the first place.

Ok, I can add this quickly. I guess it should also exit if a dimension is zero?

I guess it should also exit if a dimension is zero?

Yes definitely - I don't think there's any situation where one of the dims can be zero and it would be a valid calculation.

drroe · 2022-02-25T13:55:07Z

src/Action_GIST.cpp

    mprintf("Warning: No grid center values specified, using default (origin)\n");
+    gridcntr_ = Vec3(0.0);
+  } else {
+    gridcntr_[0] = centerArgs.getNextDouble(-1);


I like this change. If we want to be extra careful we can make sure the arguments are valid doubles with the validDouble() routine from StringRoutines.h, e.g.

if (!validDouble(centerArgs[0])) { mprinterr("Error: '%s' is not a valid argument for 'gridcntr'\n"); return Action::ERR; } gridcntr_[0] = convertToDouble(centerArgs[0]);

and so on for the other arguments. Can do something similar for the grid dimensions with validInteger().

drroe · 2022-02-25T13:58:22Z

src/GistEntropyUtils.h

@@ -0,0 +1,88 @@
+#ifndef GIST_ENTROPY_UTILS_H
+#define GIST_ENTROPY_UTILS_H
+


Ideally we could put all of this in a namespace, e.g.

namespace Cpptraj { namespace GistEntropyUtils {

I've done this with the clustering routines recently (see the Cluster subdirectory) and I think it makes for nicer code. For some things however it doesn't work since I made Cpptraj a class (when I started cpptraj I knew far less about good C++ practices than I do now). I think since only the GIST action will pull this in it would be ok. Let me know what you think.

drroe · 2022-02-25T13:59:09Z

src/GistEntropyUtils.h

+
+typedef std::vector<float> Farray;
+
+#define GIST_HUGE 10000.0


It would be better to make this a const double in a GistEntropyUtils namespace, see above.

drroe · 2022-02-25T14:01:59Z

@fwaibl I just made a bunch of comments, let me know if you'd like me to handle any of these if you don't have the time. I think I should be able to work on this PR somehow. I can also accept as-is and make the changes in a future PR. Let me know what works best for you.

Thanks for this, it's great work!

drroe · 2022-02-25T14:03:27Z

Also be aware that the code deadline for the next Amber release is March 11, so any changes we want to get into the next Amber release should be done a few days before that.

fwaibl · 2022-02-25T14:11:58Z

Thanks for your comments. I can add them on Monday, but probably not today. Otherwise, feel free to change something yourself. I think you have write access, since you already pushed a merge commit to my fork (but I am no expert on git...)

Also be aware that the code deadline for the next Amber release is March 11, so any changes we want to get into the next Amber release should be done a few days before that.

That's good to know. I am confident we can finish this pull request until then, but I probably won't be on time for the extension to ions and multiple solvents, since it turns out that there is quite some stuff that needs to be changed (mostly bookkeeping, i.e., which atoms are solute/solvent, etc.)

AmberJenkins · 2022-02-25T14:22:55Z

The PGI build in Jenkins failed.

* Move GistEntropyUtils functions into a GistEntropyUtils namespace * make GIST_HUGE a const double (instead of a macro) * add validDouble and validInteger checks for griddim and gridcntr arguments.

drroe · 2022-02-28T13:34:23Z

@fwaibl thanks!

fwaibl added 6 commits January 28, 2022 14:46

Increase cpptraj revision to 6.2.2

411887f

Bump version (revision) number to 6.2.3.

49667a7

Merge branch 'master' of https://github.com/Amber-MD/cpptraj into Amb…

fcac321

…er-MD-master

Update cpptrajdepend (make depend) after GIST entropy changes

a840937

Merge branch 'master' into master

7c930b6

fwaibl and others added 6 commits February 22, 2022 15:35

Remove declaration of unused TransEntropy function

10723f6

Merge branch 'Amber-MD:master' into master

e56c6fa

Merge branch 'master' of github.com:fwaibl/cpptraj

92c22fd

drroe reviewed Feb 25, 2022

View reviewed changes

fwaibl and others added 2 commits February 28, 2022 09:28

GIST code cleanup and command line checking

48f385f

* Move GistEntropyUtils functions into a GistEntropyUtils namespace * make GIST_HUGE a const double (instead of a macro) * add validDouble and validInteger checks for griddim and gridcntr arguments.

Merge branch 'Amber-MD:master' into master

e7a139d

drroe merged commit 8a1543b into Amber-MD:master Feb 28, 2022

		@@ -0,0 +1,88 @@
		#ifndef GIST_ENTROPY_UTILS_H
		#define GIST_ENTROPY_UTILS_H

GIST Entropy improvements #947

GIST Entropy improvements #947

Uh oh!

Conversation

fwaibl commented Feb 16, 2022

Uh oh!

AmberJenkins commented Feb 16, 2022

Uh oh!

fwaibl commented Feb 16, 2022

Uh oh!

fwaibl commented Feb 16, 2022

Uh oh!

drroe commented Feb 16, 2022

Uh oh!

drroe commented Feb 16, 2022

Uh oh!

AmberJenkins commented Feb 17, 2022

Uh oh!

drroe commented Feb 17, 2022

Uh oh!

AmberJenkins commented Feb 17, 2022

Uh oh!

fwaibl commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drroe Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

fwaibl Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

drroe Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

drroe Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

drroe Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

drroe Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

drroe commented Feb 25, 2022

Uh oh!

drroe commented Feb 25, 2022

Uh oh!

fwaibl commented Feb 25, 2022

Uh oh!

AmberJenkins commented Feb 25, 2022

Uh oh!

drroe commented Feb 28, 2022

Uh oh!

Uh oh!

fwaibl commented Feb 25, 2022 •

edited

Loading