Skip to content

Conversation

timvdm
Copy link
Member

@timvdm timvdm commented Apr 10, 2020

Various improvements and fixes for the tautomer code.

  • Fix issue Bug in obtautomer #227
  • Fix various backtracking issues (rewritten using RAII)
  • Perceive aromaticity for each tautomer (fixes failing antralin test)
  • Mark atoms that have initially have all bonds assigned as Other (fixes failing tenoxicam test which contains sulfon group)
  • Add support for enumerating unique tautomers only (duplicates may be present due to symmetry)
  • Add test cases from slides, all are working now (see https://www.daylight.com/meetings/emug99/Delany/taut_html/sld030.htm)

TODO:

  • More testing with special functional groups (e.g., N-oxide, conjugated carbanion/carbocation, conjugated nitrogen)
  • Add support for keto/enol tautomerism?

* EnumerateTautomers(mol, functor);
* @endcode
*/
class OBAPI UniqueTautomerFunctor : public TautomerFunctor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mark this for 3.1 when we can add new API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I make a separate version with only the bug fixes and no API changes for 3.0.1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think you can provide only the bug fixes with no API changes, that would be great. There's clearly interest in a 3.0.1 release.

@baoilleach
Copy link
Member

Brilliant. I'll test it out - I have a set of tautomers somewhere I've pulled out from ChEMBL. Regarding keto-enol, if you do add this in, could you put it within an option just in case it leads to weird results? I have the impression from others that this may be the case.

@ghutchis
Copy link
Member

ghutchis commented May 3, 2020

@baoilleach - did you have a chance to test this? I'd like to merge for 3.1 - even if it needs further improvement, it's a big step forward.

@baoilleach
Copy link
Member

baoilleach commented May 3, 2020 via email

@ghutchis ghutchis merged commit a40bfa6 into openbabel:master May 3, 2020
@baoilleach
Copy link
Member

Just to follow-up (only a year too late). The new code is much improved indeed. However, I'm still able to find some test failures, e.g. S=c1[nH]c(c([nH]1)c1ccccc1)c1ccccc1 and Sc1[nH]c(c(n1)c1ccccc1)c1ccccc1 don't resolve to the same form. I can provide a longer list if @timvdm or anyone else is interested.

A separate issue is that it's not possible to use the functor from the Python API. I know that OEChem have solved this by somehow enabling Python functions to be passed in. An alternative would be to do as Tim did previously for automorphisms (I think), where there is an API function that either has a default functor or acts as a convenience function to just return a vector<OBMol*> of results (perhaps with an optional maxsize argument?).

@baoilleach baoilleach mentioned this pull request Sep 5, 2021
@timvdm
Copy link
Member Author

timvdm commented Sep 12, 2021

@baoilleach This issue is fixed in #2410

A longer list of examples is welcome if there are still issues. I'll have a look at the callback issue when I have time.

@baoilleach
Copy link
Member

baoilleach commented Sep 12, 2021

Great stuff @timvdm. I'll pull a list together.

Related to this I've been searching for other implementations which may be of interest. Roger's one is hidden behind his slides: https://www.daylight.com/meetings/emug99/Delany/tautomers/. Meanwhile, @johnmay has an implementation discussed in his thesis available on a CDK branch: https://github.com/cdk/cdk/tree/sd-tautomer/tool/tautomer/src/main/java/org/openscience/cdk/tautomer.

Something to note is that when it comes to the step of trying to assign double/single bonds to the system, it should be possible to call the OBKekulize function with a properly prepared OBMol. This function can kekulize any system that's possible to kekulize and returns 'false' otherwise.

@ghutchis
Copy link
Member

@baoilleach
Copy link
Member

...the second of which has a contribution from me. :-)

@ghutchis
Copy link
Member

Oh, I hadn't noticed the "O'Boyle and Sayle" spreadsheet. 😉

My group has a longer-term project to use QM calc to generate more tautomer preferences, but that's probably several months to a year out.

@baoilleach
Copy link
Member

baoilleach commented Sep 14, 2021

Hi @timvdm, apart from a logical error in my own code (which I need to think about), the OB code still had trouble with the following pairs:

*=Nc1[nH]cc2-c(c1)c1ccccc1n2,*=Nc1ncc2c(c1)c1ccccc1[nH]2
*=Nc1ccc(cc1)NC(Nc1ccccc1)=S,*=Nc1ccc(cc1)N=C(Nc1ccccc1)S
*=Nc1cccc(c1)NC(N*)=S,*=Nc1cccc(c1)NC(=N*)S
*=Nc1ccc(cc1)NC(=N*)S,*=Nc1ccc(cc1)NC(N*)=S

It also had trouble with protonated nitrogens, which hopefully is a simple atom-typing issue:

*N=CC(c1[nH]ncn1)=CN=[N+](c1ccccc1)*,*N=CC(c1n[nH]cn1)=CN=[N+](c1ccccc1)*
*NC(c1ccc[n+](c1)*)=O,*N=C(c1ccc[n+](c1)*)O

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants