-
Notifications
You must be signed in to change notification settings - Fork 55
TI with gaussian quadrature #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TI with gaussian quadrature #304
Conversation
Add an optional gaussian quadrature functionality to estimate the integral
Codecov Report
@@ Coverage Diff @@
## master #304 +/- ##
==========================================
+ Coverage 98.76% 98.81% +0.05%
==========================================
Files 27 28 +1
Lines 1778 1860 +82
Branches 391 402 +11
==========================================
+ Hits 1756 1838 +82
Misses 2 2
Partials 20 20
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a really quick look. In addition to my comments you also need to
- add tests that cover all your code
- add documentation (example to the TI page, more details to the TI class) and cite any relevant papers
- add an entry to CHANGES
- add yourself to AUTHORS
Addressing the changes is no guarantee that the PR is accepted; addressing them is a necessary but not sufficient condition. I'd also like to have @xiki-tempula look at it and @mrshirts if he can spare some time (mainly on the "is it worth it", based on the papers you cited)
src/alchemlyb/estimators/ti_.py
Outdated
_gq_delta_f_ : DataFrame, optional | ||
The estimated dimensionless cumulative sum of free energy up to each state | ||
with gaussian quadrature. | ||
|
||
_gq_d_delta_f_ : DataFrame, optional | ||
The estimated statistical uncertainty (one standard deviation) in | ||
the cumulative sum of free energy up to each state with gaussian quadrature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will users want to access these data structures? If yes, no underscore and document them as you do.
If they are private attributes then they don't need to be documented.
src/alchemlyb/estimators/ti_.py
Outdated
|
||
_gq_d_delta_f_ : DataFrame, optional | ||
The estimated statistical uncertainty (one standard deviation) in | ||
the cumulative sum of free energy up to each state with gaussian quadrature | ||
|
||
.. versionchanged:: 1.0.0 | ||
`delta_f_`, `d_delta_f_`, `states_` are view of the original object. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd need to add a versionchanged.
src/alchemlyb/estimators/ti_.py
Outdated
|
||
.. versionchanged:: 1.0.0 | ||
`delta_f_`, `d_delta_f_`, `states_` are view of the original object. | ||
|
||
""" | ||
|
||
def __init__(self, verbose=False): | ||
def __init__(self, verbose=False, gauss_qua=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer written out keywords, eg. quadrature
.
src/alchemlyb/estimators/ti_.py
Outdated
self.verbose = verbose | ||
self.gauss_qua = gauss_qua |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_gaussian_quadrature
to be clear
src/alchemlyb/estimators/ti_.py
Outdated
lambdas = means.reset_index()[means.index.names[:]].iloc[:].values.sum(axis=1) | ||
num_lambdas = len(lambdas) | ||
# check if the lambda values used match the common ones suggested in Amber manual and assign cooresponding weights | ||
if num_lambdas==1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid code repetition and that many elif statements. Make the code much more compact and extendable by storing the special points and weights in a data structure
special_points = {1: [[0.5], [1.0]],
2: [[0.21132, 0.78867], [0.5, 0.5]],
3: [[0.1127, 0.5, 0.88729], [0.27777, 0.44444, 0.27777]]
...}
and have one code path that pulls its points from the special_points
.
src/alchemlyb/estimators/ti_.py
Outdated
if num_lambdas==1: | ||
try: | ||
reference_lambdas = np.round([0.5], 4) | ||
assert (lambdas == reference_lambdas).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert is not the correct way to check input. If there's anything wrong with user data you need to raise an exception, probably a ValueError.
Furthermore, never use ==
to check floating point number equality. Instead use np.allclose()
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to add a new estimator in parallel to the current TI instead of adding it to TI. If you need something that is in TI, might be good to abstract out a BaseTI class.
Sorry, I don't have much time this/next week to review. I hope @xiki-tempula can have a look. I agree with the suggestion to make it a separate estimator instead of changing standard TI. |
separate TI estimator with gaussian quadrature
include TI_gq
Add some information
Thank you for the comments @orbeckst @xiki-tempula I may need some help to add tests to cover all code. I didn't find a testset in alchemtest for Amber gaussian quadrature (with suggested lambda values). Do I also need to upload an example to alchemtest for the purpose of testing and adding an example on TI documentation page? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR. It looks reasonable, I have had a quick first pass.
- We need some tests to test this method, you need to add them to alchemtest.
- Add documentation to explain what is TI_GQ and when and why do we want it.
- Abstract out the code that is the same between TI and TI_GQ as TIBase and inherit from it.
CHANGES
Outdated
@@ -13,6 +13,15 @@ The rules for this file: | |||
* release numbers follow "Semantic Versioning" https://semver.org | |||
|
|||
------------------------------------------------------------------------------ | |||
06/04/2023 hl2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*/*/* hl2500
As we don't know when to release the new version yet
CHANGES
Outdated
@@ -13,6 +13,15 @@ The rules for this file: | |||
* release numbers follow "Semantic Versioning" https://semver.org | |||
|
|||
------------------------------------------------------------------------------ | |||
06/04/2023 hl2500 | |||
|
|||
* 2.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Unreleased
|
||
class TI_gq(BaseEstimator, _EstimatorMixOut): | ||
"""Thermodynamic integration (TI) with gaussian quadrature estimation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a bit of background of how is this different from the TI
The estimated dhdl of each state. | ||
|
||
|
||
.. versionchanged:: 2.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this. Do versionadded
instead
# variance calculation assumes no correlation between points | ||
# used to calculate mean | ||
means = dHdl.groupby(level=dHdl.index.names[1:]).mean() | ||
variances = np.square(dHdl.groupby(level=dHdl.index.names[1:]).sem()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the part that is the same in TI and TI_GQ, abstract a TIBase class and put them there. We don't want code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified this part slightly to use the separate_dhdl(self) function.
variances = np.square(dHdl.groupby(level=dHdl.index.names[1:]).sem()) | ||
|
||
# extract the lambda values used in the simulations | ||
lambdas = means.reset_index()[means.index.names[:]].iloc[:].values.sum(axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe for the dataframe that has more than one lambda values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this part to support multi-lambda situation. However, scaling multiple lambdas at the same time is not supported (e.g. (0.04691, 0.02544), (0.23076, 0.12923) ...) and it might not be the right way to perform the simulations
lambdas = means.reset_index()[means.index.names[:]].iloc[:].values.sum(axis=1) | ||
num_lambdas = len(lambdas) | ||
# suggested lambda values corresponding weights | ||
special_points = {1: [[0.5], [1.0]], 2: [[0.21132, 0.78867], [0.5, 0.5]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this part more compared with the previous version. It is better to use dispatch than a bunch of if else
|
||
return self | ||
|
||
def separate_dhdl(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as the one in TI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this function is different to support multi-lambda situation.
update information
Updated
Thank you so much for the suggestions. I have added two testsets to alchemtest (PR: Add gaussian quadrature test) and made some changes. @orbeckst @xiki-tempula |
@hl2500 Thanks for the new test files updated in alchemtest. Do you mind test them in this PR? So I could know what are the test for and how they have been used. |
@xiki-tempula Okay. Could I ask how to implement test in this PR? Thanks. |
I think if you just change this line https://github.com/alchemistry/alchemlyb/blob/master/.github/workflows/ci.yaml#L58 |
add test
@xiki-tempula I have changed this line and some of the ubuntu tests failed. Could I ask what should I do? Thanks. |
Just Codecov acting weird. It is fine now. |
@xiki-tempula Thanks, could I ask what other tests I need to do? |
@hl2500 Thanks for the contribution, so the test should be in a format that is similar to https://github.com/alchemistry/alchemlyb/blob/master/src/alchemlyb/tests/test_ti_estimators.py, which is for the TI estimator. Which shall test if the result is numerically good. |
will add a new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for continuously addressing my comments. The code part looks good to me now. We only need to polish the doc a bit and this PR should be good.
docs/estimators-ti.rst
Outdated
each :math:`\lambda` window, the values in the ``delta_f_`` and ``d_delta_f_`` tables are cumulative addition of estimation from one :math:`\lambda` window to another. | ||
To be consistent with :class:`~alchemlyb.estimators.TI` and other estimators, the diagonal values are set to zeros and two end states at :math:`\lambda` | ||
0 and 1 are added, although the simulation may not be performed at :math:`\lambda` 0 and 1. However, they are not considered to estimate the final results. | ||
For example, the values at :math:`\lambda` 0 is 0 and :math:`\lambda` 1 is the same as the previous gaussian quadrature point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean However, they are not considered to estimate the final results.
, I was under the impression that the for lambda points [0.1127, 0.5, 0.88729]
, the result at point 0.88729
is the free energy difference between 0 and 1?
:math:
\lambda 0 is 0
is not the same as the previous gaussian quadrature point`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean
However, they are not considered to estimate the final results.
Sorry for the confusion. I meant that they didn't contribute to the final free energy values, as there were no estimations at lambda 0 and 1. I have removed this sentence. The next sentence should explain this.
@orbeckst Do you mind having a look at this PR as well? I think the code part is good and we just need to polish the doc a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for all the effort. I think I'm quite happy with this PR now.
However, apologise that we cannot merge this in now as it is a significant and important addition. So I want to make sure that no one would have strong opposition to this PR.
I purpose that we leave the PR here for one month and if no one is feeling strongly about it, I will merge this in (aim for 2.2.0). What do you think @orbeckst ?
Okay, thank you so much for your guidance and time into this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the reference to the alchemtest PR and I will merge this PR next weekend. Thanks for waiting.
@xiki-tempula This part has been removed. Thank you! |
I have merge this PR thanks for all the contribution and the patience. |
@xiki-tempula Thank you again for your guidance in this work! |
@hl2500 No worries. Looking forward to further contribution from you. |
Added an optional gaussian quadrature functionality to estimate the TI integral.
This functionality could be triggered when initializing:
TI(gauss_qua=True)
The lambda values and weights were referred the Amber manual (page 493): http://ambermd.org/doc12/Amber22.pdf
The formula used to estimate the free energy and standard errors were referred the the Amber manual (eq. 25.2) and this literature (eq. 3): https://pubs.acs.org/doi/10.1021/acs.jcim.2c01052?ref=pdf, respectively.
The final two dataframes with free energy values and errors are upper triangluar matrix, because the values are cumulative sum upto each lambda state rather than differences. The diagonal values are the estimation at each lambda.
I have tested 12 lambdas and 9 lambdas with Amber one-step method to estimate the free energy and error of each leg in the thermodynamic cycle. The results are comparable to the ones estimated with trapezoidal rule. The other number of lambdas (1, 2, 3, 5, 7) should work as well.