Skip to content

Conversation

hl2500
Copy link
Contributor

@hl2500 hl2500 commented Apr 3, 2023

Added an optional gaussian quadrature functionality to estimate the TI integral.

This functionality could be triggered when initializing: TI(gauss_qua=True)

The lambda values and weights were referred the Amber manual (page 493): http://ambermd.org/doc12/Amber22.pdf

The formula used to estimate the free energy and standard errors were referred the the Amber manual (eq. 25.2) and this literature (eq. 3): https://pubs.acs.org/doi/10.1021/acs.jcim.2c01052?ref=pdf, respectively.

The final two dataframes with free energy values and errors are upper triangluar matrix, because the values are cumulative sum upto each lambda state rather than differences. The diagonal values are the estimation at each lambda.

I have tested 12 lambdas and 9 lambdas with Amber one-step method to estimate the free energy and error of each leg in the thermodynamic cycle. The results are comparable to the ones estimated with trapezoidal rule. The other number of lambdas (1, 2, 3, 5, 7) should work as well.

Add an optional gaussian quadrature functionality to estimate the integral
@codecov
Copy link

codecov bot commented Apr 3, 2023

Codecov Report

Merging #304 (13764cb) into master (f5cf43a) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #304      +/-   ##
==========================================
+ Coverage   98.76%   98.81%   +0.05%     
==========================================
  Files          27       28       +1     
  Lines        1778     1860      +82     
  Branches      391      402      +11     
==========================================
+ Hits         1756     1838      +82     
  Misses          2        2              
  Partials       20       20              
Impacted Files Coverage Δ
src/alchemlyb/estimators/__init__.py 100.00% <100.00%> (ø)
...rc/alchemlyb/estimators/ti_gaussian_quadrature_.py 100.00% <100.00%> (ø)

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a really quick look. In addition to my comments you also need to

  • add tests that cover all your code
  • add documentation (example to the TI page, more details to the TI class) and cite any relevant papers
  • add an entry to CHANGES
  • add yourself to AUTHORS

Addressing the changes is no guarantee that the PR is accepted; addressing them is a necessary but not sufficient condition. I'd also like to have @xiki-tempula look at it and @mrshirts if he can spare some time (mainly on the "is it worth it", based on the papers you cited)

Comment on lines 33 to 39
_gq_delta_f_ : DataFrame, optional
The estimated dimensionless cumulative sum of free energy up to each state
with gaussian quadrature.

_gq_d_delta_f_ : DataFrame, optional
The estimated statistical uncertainty (one standard deviation) in
the cumulative sum of free energy up to each state with gaussian quadrature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will users want to access these data structures? If yes, no underscore and document them as you do.

If they are private attributes then they don't need to be documented.


_gq_d_delta_f_ : DataFrame, optional
The estimated statistical uncertainty (one standard deviation) in
the cumulative sum of free energy up to each state with gaussian quadrature

.. versionchanged:: 1.0.0
`delta_f_`, `d_delta_f_`, `states_` are view of the original object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd need to add a versionchanged.


.. versionchanged:: 1.0.0
`delta_f_`, `d_delta_f_`, `states_` are view of the original object.

"""

def __init__(self, verbose=False):
def __init__(self, verbose=False, gauss_qua=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer written out keywords, eg. quadrature.

self.verbose = verbose
self.gauss_qua = gauss_qua
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_gaussian_quadrature to be clear

lambdas = means.reset_index()[means.index.names[:]].iloc[:].values.sum(axis=1)
num_lambdas = len(lambdas)
# check if the lambda values used match the common ones suggested in Amber manual and assign cooresponding weights
if num_lambdas==1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid code repetition and that many elif statements. Make the code much more compact and extendable by storing the special points and weights in a data structure

special_points = {1:   [[0.5], [1.0]], 
                             2:  [[0.21132, 0.78867], [0.5, 0.5]],
                             3:  [[0.1127, 0.5, 0.88729],  [0.27777, 0.44444, 0.27777]]
                             ...}

and have one code path that pulls its points from the special_points.

if num_lambdas==1:
try:
reference_lambdas = np.round([0.5], 4)
assert (lambdas == reference_lambdas).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert is not the correct way to check input. If there's anything wrong with user data you need to raise an exception, probably a ValueError.

Furthermore, never use == to check floating point number equality. Instead use np.allclose() or similar.

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to add a new estimator in parallel to the current TI instead of adding it to TI. If you need something that is in TI, might be good to abstract out a BaseTI class.

@hl2500 hl2500 requested a review from orbeckst April 8, 2023 16:34
@hl2500 hl2500 marked this pull request as draft April 8, 2023 16:52
@orbeckst
Copy link
Member

orbeckst commented Apr 8, 2023

Sorry, I don't have much time this/next week to review. I hope @xiki-tempula can have a look. I agree with the suggestion to make it a separate estimator instead of changing standard TI.

@hl2500
Copy link
Contributor Author

hl2500 commented Apr 9, 2023

Thank you for the comments @orbeckst @xiki-tempula I may need some help to add tests to cover all code. I didn't find a testset in alchemtest for Amber gaussian quadrature (with suggested lambda values). Do I also need to upload an example to alchemtest for the purpose of testing and adding an example on TI documentation page? Thanks!

@hl2500 hl2500 requested a review from xiki-tempula April 9, 2023 22:07
Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. It looks reasonable, I have had a quick first pass.

  • We need some tests to test this method, you need to add them to alchemtest.
  • Add documentation to explain what is TI_GQ and when and why do we want it.
  • Abstract out the code that is the same between TI and TI_GQ as TIBase and inherit from it.

CHANGES Outdated
@@ -13,6 +13,15 @@ The rules for this file:
* release numbers follow "Semantic Versioning" https://semver.org

------------------------------------------------------------------------------
06/04/2023 hl2500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/*/* hl2500

As we don't know when to release the new version yet

CHANGES Outdated
@@ -13,6 +13,15 @@ The rules for this file:
* release numbers follow "Semantic Versioning" https://semver.org

------------------------------------------------------------------------------
06/04/2023 hl2500

* 2.0.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Unreleased


class TI_gq(BaseEstimator, _EstimatorMixOut):
"""Thermodynamic integration (TI) with gaussian quadrature estimation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a bit of background of how is this different from the TI

The estimated dhdl of each state.


.. versionchanged:: 2.0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this. Do versionadded instead

# variance calculation assumes no correlation between points
# used to calculate mean
means = dHdl.groupby(level=dHdl.index.names[1:]).mean()
variances = np.square(dHdl.groupby(level=dHdl.index.names[1:]).sem())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the part that is the same in TI and TI_GQ, abstract a TIBase class and put them there. We don't want code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this part slightly to use the separate_dhdl(self) function.

variances = np.square(dHdl.groupby(level=dHdl.index.names[1:]).sem())

# extract the lambda values used in the simulations
lambdas = means.reset_index()[means.index.names[:]].iloc[:].values.sum(axis=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe for the dataframe that has more than one lambda values?

Copy link
Contributor Author

@hl2500 hl2500 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this part to support multi-lambda situation. However, scaling multiple lambdas at the same time is not supported (e.g. (0.04691, 0.02544), (0.23076, 0.12923) ...) and it might not be the right way to perform the simulations

lambdas = means.reset_index()[means.index.names[:]].iloc[:].values.sum(axis=1)
num_lambdas = len(lambdas)
# suggested lambda values corresponding weights
special_points = {1: [[0.5], [1.0]], 2: [[0.21132, 0.78867], [0.5, 0.5]],
Copy link
Collaborator

@xiki-tempula xiki-tempula Apr 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this part more compared with the previous version. It is better to use dispatch than a bunch of if else


return self

def separate_dhdl(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as the one in TI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this function is different to support multi-lambda situation.

hl2500 added 2 commits April 16, 2023 23:46
update information
@hl2500
Copy link
Contributor Author

hl2500 commented Apr 17, 2023

Thank you so much for the suggestions. I have added two testsets to alchemtest (PR: Add gaussian quadrature test) and made some changes. @orbeckst @xiki-tempula

@hl2500 hl2500 marked this pull request as ready for review April 17, 2023 04:03
@xiki-tempula
Copy link
Collaborator

@hl2500 Thanks for the new test files updated in alchemtest. Do you mind test them in this PR? So I could know what are the test for and how they have been used.

@hl2500
Copy link
Contributor Author

hl2500 commented Apr 18, 2023

@xiki-tempula Okay. Could I ask how to implement test in this PR? Thanks.

@xiki-tempula
Copy link
Collaborator

I think if you just change this line https://github.com/alchemistry/alchemlyb/blob/master/.github/workflows/ci.yaml#L58
to
python -m pip install git+https://github.com/hl2500/alchemtest.git@Add_gaussian_quadrature_test
You should be able to get this in the CI.

add test
@hl2500
Copy link
Contributor Author

hl2500 commented Apr 19, 2023

@xiki-tempula I have changed this line and some of the ubuntu tests failed. Could I ask what should I do? Thanks.

@xiki-tempula
Copy link
Collaborator

Just Codecov acting weird. It is fine now.

@hl2500
Copy link
Contributor Author

hl2500 commented Apr 20, 2023

@xiki-tempula Thanks, could I ask what other tests I need to do?

@xiki-tempula
Copy link
Collaborator

@hl2500 Thanks for the contribution, so the test should be in a format that is similar to https://github.com/alchemistry/alchemlyb/blob/master/src/alchemlyb/tests/test_ti_estimators.py, which is for the TI estimator. Which shall test if the result is numerically good.
The other thing that you need to test is if the test is covering all of the code, you could check the coverage here https://github.com/alchemistry/alchemlyb/pull/304/checks?check_run_id=12878246768, we would like to pump it as high as possible.

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for continuously addressing my comments. The code part looks good to me now. We only need to polish the doc a bit and this PR should be good.

each :math:`\lambda` window, the values in the ``delta_f_`` and ``d_delta_f_`` tables are cumulative addition of estimation from one :math:`\lambda` window to another.
To be consistent with :class:`~alchemlyb.estimators.TI` and other estimators, the diagonal values are set to zeros and two end states at :math:`\lambda`
0 and 1 are added, although the simulation may not be performed at :math:`\lambda` 0 and 1. However, they are not considered to estimate the final results.
For example, the values at :math:`\lambda` 0 is 0 and :math:`\lambda` 1 is the same as the previous gaussian quadrature point.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean However, they are not considered to estimate the final results., I was under the impression that the for lambda points [0.1127, 0.5, 0.88729], the result at point 0.88729 is the free energy difference between 0 and 1?

:math:\lambda 0 is 0 is not the same as the previous gaussian quadrature point`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean However, they are not considered to estimate the final results.

Sorry for the confusion. I meant that they didn't contribute to the final free energy values, as there were no estimations at lambda 0 and 1. I have removed this sentence. The next sentence should explain this.

@xiki-tempula
Copy link
Collaborator

@orbeckst Do you mind having a look at this PR as well? I think the code part is good and we just need to polish the doc a bit.

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for all the effort. I think I'm quite happy with this PR now.
However, apologise that we cannot merge this in now as it is a significant and important addition. So I want to make sure that no one would have strong opposition to this PR.
I purpose that we leave the PR here for one month and if no one is feeling strongly about it, I will merge this in (aim for 2.2.0). What do you think @orbeckst ?

@xiki-tempula xiki-tempula added this to the 2.2.0 milestone Jun 17, 2023
@hl2500
Copy link
Contributor Author

hl2500 commented Jun 17, 2023

Thanks very much for all the effort. I think I'm quite happy with this PR now. However, apologise that we cannot merge this in now as it is a significant and important addition. So I want to make sure that no one would have strong opposition to this PR. I purpose that we leave the PR here for one month and if no one is feeling strongly about it, I will merge this in (aim for 2.2.0). What do you think @orbeckst ?

Okay, thank you so much for your guidance and time into this.

@xiki-tempula xiki-tempula modified the milestones: 2.2.0, 2.3.0 Jun 24, 2023
Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the reference to the alchemtest PR and I will merge this PR next weekend. Thanks for waiting.

@hl2500
Copy link
Contributor Author

hl2500 commented Jul 8, 2023

@xiki-tempula This part has been removed. Thank you!

@xiki-tempula xiki-tempula merged commit 862343b into alchemistry:master Jul 14, 2023
@xiki-tempula
Copy link
Collaborator

I have merge this PR thanks for all the contribution and the patience.

@hl2500
Copy link
Contributor Author

hl2500 commented Jul 15, 2023

@xiki-tempula Thank you again for your guidance in this work!

@xiki-tempula
Copy link
Collaborator

@hl2500 No worries. Looking forward to further contribution from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants