Skip to content

ORCA TDDFT and ROCIS Parsing #398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 13, 2017
Merged

ORCA TDDFT and ROCIS Parsing #398

merged 7 commits into from
Oct 13, 2017

Conversation

jevandezande
Copy link
Contributor

@jevandezande jevandezande commented Jul 21, 2017

This pull request updates the parsing of TDDFT and adds additional parsing for ROCIS. At the moment it parses all of the spectra, consistently overwriting until the final spectra is parsed. This is obviously less than optimal behavior, and can take a few seconds for larger systems.

Questions

  • Should I create a new set of regressions for ROCIS? It is very similar to TDDFT, but I need an open shell case to produce all the various different types of results to be printed.
  • What should be done about all of the different results being parsed? Just taking the last one currently breaks the tests.

TODO

  • Fix regression tests

Status

  • Ready to go

    Increases the amount of the TD header checked.
    Adds ROCIS test cases (currently not integrated into unit tests).
@langner
Copy link
Member

langner commented Jul 21, 2017

First of all, thanks for the contribution! Our convention has mostly been to parse the last (best?) results. So in this situation we would overwrite the attribute just as you've done, and add the regressions you need to cover ROCIS cases.

But, I would defer to Eric about the decision making here, I think he has more context and knowledge about these types of calculations. @berquist do you think parsing the final result is good here, or do we need to consider parsing multiple spectra for some reason?

Concerning the regression you broke - is that also ROCIS, causing it to fail?

@berquist
Copy link
Member

Should I create a new set of regressions for ROCIS? It is very similar to TDDFT, but I need an open shell case to produce all the various different types of results to be printed.

What we do for unrestricted test cases is take the same divinylbenzene geometry from restricted cases and make the cation doublet. This may be an appropriate time to start incrementally making restricted open-shell tests. See more on this later.

What should be done about all of the different results being parsed? Just taking the last one currently breaks the tests.

See the next paragraph.

But, I would defer to Eric about the decision making here, I think he has more context and knowledge about these types of calculations. @berquist do you think parsing the final result is good here, or do we need to consider parsing multiple spectra for some reason?

I think that for now, we should test against results comparable to what we already have, which is ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS, the origin-dependent length form, so the first result. This should maintain consistency for the current attributes across all programs, though I am not sure if some programs currently parse the velocity form rather than the length form (Gaussian?).

However, I would like to expand the types of properties we can parse, since multiple programs can print multiple forms. See a related old PR #227 that parses electric length, electric velocity, and magnetic transition dipole moments from Gaussian. This is good incentive to figure out how these less general attributes or properties should be stored.

This is (to me) just a ROTDDFT calculation with the TDA/CIS approximation and scaling factors for some integral terms. I think this test can be made more general (across parsers) if the molecule is switched from a boron atom to DVB, and the rest of the input is changed to something like

! roks b3lyp sto-3g nori noautostart

% rocis
    nroots  5
    maxcore 2500
    maxdim  50
    dori    false
    soc     true
    dohighermult true
    dodftcis   	true
    doquad      true
    dftcis_c = 0.21, 0.49, 0.29
    orbwin = 0, 0, 0, 2000
end

* xyz 1 2
 C                  0.27867948   -1.36683162    0.00000000
 C                  1.32303041   -0.44173575    0.00000000
 C                  1.04434506    0.92484978    0.00000000
 C                 -0.27867948    1.36683162    0.00000000
 C                 -1.32303041    0.44173575    0.00000000
 C                 -1.04434506   -0.92484978    0.00000000
 H                  2.36595443   -0.79037726    0.00000000
 H                  1.86746094    1.65407997    0.00000000
 H                 -2.36595443    0.79037726    0.00000000
 H                 -1.86746094   -1.65407997    0.00000000
 C                 -0.58659169    2.87589931    0.00000000
 C                  0.36350188    3.80076420    0.00000000
 H                 -1.65647768    3.12394312    0.00000000
 H                  0.14429560    4.87693235    0.00000000
 H                  1.43338788    3.55272039    0.00000000
 C                  0.58659169   -2.87589931    0.00000000
 C                 -0.36350188   -3.80076420    0.00000000
 H                  1.65647768   -3.12394312    0.00000000
 H                 -0.14429560   -4.87693235    0.00000000
 H                 -1.43338788   -3.55272039    0.00000000
*

At least DALTON, Q-Chem, I think NWChem, and presumably Gaussian can perform ROCIS/ROTDDFT calculations, and I think we should try and write a common test for all those parsers. Without running calculations it isn't obvious to me how sensitive the results will be to changing the dftcis_c parameters, which would affect if a single common test could be written with reasonable error bars. Maybe @jevandezande could shed some light on this.

Fix regression tests

It's only the old unit tests now in the regression suite, so whatever the code fix is will fix both the unit and regression tests.

self.etoscs.append(osc)

# Parse the various absorption spectra for TDDFT and ROCIS
if 'ABSORPTION SPECTRUM' in line or 'ELECTRIC DIPOLE' in line:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would anything fail if this was changed to only if 'ABSORPTION SPECTRUM' in line:?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the quadrupole corrections do not mention ABSORPTION SPECTRUM. They usually look like: COMBINED ELECTRIC DIPOLE + MAGNETIC DIPOLE + ELECTRIC QUADRUPOLE SPECTRUM.
Also, the X-Ray emission spectrum would be missed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense.

return energy, intensity

# Check for variations
if (line[16:24] == 'COMBINED' and not 'ROCIS' in line) or \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this isn't the full string, say if line.strip() == 'COMBINED ELECTRIC DIPOLE + MAGNETIC DIPOLE + ELECTRIC QUADRUPOLE SPECTRUM'? Same for the others. If not, this is harder to see the logic for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe that is a better way to do it for most of them.

2 61793079.3 0.2 0.00000 0.00000 2.85949 0.00000 -0.00000 0.00000285948800 0.00000 0.00000 1.00000 0.00000 -0.00000
"""
try:
state, energy, wavelength, d2, m2, q2, dm, do, intensity, d2_contrib, m2_contrib, q2_contrib, dm_contrib, do_contrib = line.split()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be broken up into multiple lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the most compact yet readable format that I could think of. I'll see what I can do with multiple lines.

@langner
Copy link
Member

langner commented Jul 24, 2017

Thanks for the input Eric, I totally missed the connection to the gauge thing.

@jevandezande
Copy link
Contributor Author

The DFT/ROCIS I added is basically just ROCIS with DFT orbitals and some fudge factors. The scaling parameters included are the defaults; they have been optimized for B3LYP/def2-tzvp(-f) (Roemelt, M.; Neese, F. (2013) J. Phys. Chem. A, 117, 3069). If we wanted to simplify the test input, we could drop the DFT part and do standard ROCIS, which prints the exact same way. Since DVB is symmetric, the targeted core orbital must be localized, or the orbital window (OrbWin) needs to be expanded to include all symmetry equivalent orbitals.

I think saving all forms akin to PR #227 would be a great idea. This would make it easier to compare the effects of different approximations and also help protect the user from accidently grabbing the wrong absorption spectrum (that happened to me, which is what prompted the rewrite).

As a side note, such a module might be nice for thermochemistry, where there are a lot of small pieces that are spit out that someone may want to look at.

@berquist
Copy link
Member

berquist commented Aug 14, 2017

Ok, sorry for letting this sit for so long.

If we wanted to simplify the test input, we could drop the DFT part and do standard ROCIS, which prints the exact same way.

As long as the printing is exactly the same way, I'd like this, so it gives us a chance of reusing the test for other parsers. I discovered that Q-Chem can't do ROCIS, so this may be less relevant than I wanted.

Since DVB is symmetric, the targeted core orbital must be localized, or the orbital window (OrbWin) needs to be expanded to include all symmetry equivalent orbitals.

Can we get the same output from a non-XAS/valence-only calculation? I don't think other programs can do orbital window selection like ORCA, so they'd have to calculate all possible states, which might be expensive.

I think saving all forms akin to PR #227 would be a great idea. This would make it easier to compare the effects of different approximations and also help protect the user from accidently grabbing the wrong absorption spectrum (that happened to me, which is what prompted the rewrite).

A problem that I see here is that program-specific field names will be almost unavoidable as we move to less-general properties. Suggestions for what these should be named and how they're structured are welcome. If we're going to be program-specific, the overly verbose but safe solution is to use the section headers (X-RAY ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS, etc.) as dictionary keys. That way, it's very clear what's been parsed without looking at the documentation.


If any of the above changes break possible added functionality and the test can't be made general, then I'd prefer the added functionality and an ORCA-specific test.

For now, if there is a section that's clearly identifiable as the preferable one, we should use that, and the general parsing strategy can be changed later when we make breaking changes.

As a side note, such a module might be nice for thermochemistry, where there are a lot of small pieces that are spit out that someone may want to look at.

If you could explain this in more detail (opening another issue for a question/discussion would be good), that would help our discussion about how "custom" or more complex attributes will be made user-facing in the future.

    Adds a dictionary containing all spectra computed with TDDFT, ROCIS, etc.
    Keys are the titles for the spectra.
    Values are a tuple (etenergies, etoscs).
@jevandezande
Copy link
Contributor Author

I added a transprop dictionary that has the title of each TDDFT, ROCIS, etc. calculation as the key and a tuple of (etenergies, etoscs) as the values. The etenergies and etoscs are then set to the last parsed spectrum. Thoughts?

As for testing, I added DVB with ROCIS and a test case subclassed from GenericTDTest. Then I overwrote most of the tests. I feel like keeping it with the TD test cases makes the most sense.

@langner
Copy link
Member

langner commented Aug 17, 2017

A problem that I see here is that program-specific field names will be almost unavoidable as we move to less-general properties. Suggestions for what these should be named and how they're structured are welcome. If we're going to be program-specific, the overly verbose but safe solution is to use the section headers (X-RAY ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS, etc.) as dictionary keys. That way, it's very clear what's been parsed without looking at the documentation.

I think this could be dealt with easily if we expanded our concept of attribute. If used a custom Attribute class we could develop some convention (sub-attributes, lookups, dictionary-like behavior, whatever) that would cover less widespread variants or even program-specific results.

@berquist
Copy link
Member

Now I'm unsure of what to do about this PR, since I don't want to make @jevandezande redo a lot of this work. Should we figure out the new structure before merging this, or take the "best result" and alter the structure to use the new API later?

@@ -236,7 +236,8 @@ TD Gaussian GaussianTDDFTTest basicGaussian09 dvb_td.out
TD Jaguar JaguarTDDFTTest basicJaguar8.3 dvb_td.out
TD ORCA OrcaTDDFTTest basicORCA2.9 dvb_td.out
TD ORCA OrcaTDDFTTest basicORCA3.0 dvb_td.out
TD QChem OrcaTDDFTTest basicQChem4.2 dvb_td.out
TD ORCA OrcaROCISTest basicORCA4.0 dvb_rocis.out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@langner I am fine with adding a new unit test as long as you are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeit says reading the file takes 9.9 ms on my laptop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so much timing as whether or not the test type is general enough, though we are clearly starting to move away from that, and correctness is the most important thing.

@jevandezande
Copy link
Contributor Author

Should the discussion of making a custom Attribute class be moved to a separate issue (it could potentially be used for other attributes)? Currently I only use ROCIS sparingly, so I am in no hurry to merge this code in.

@berquist
Copy link
Member

Should the discussion of making a custom Attribute class be moved to a separate issue (it could potentially be used for other attributes)? Currently I only use ROCIS sparingly, so I am in no hurry to merge this code in.

I was worried about that, so this is good to hear. I've made a new issue: #419.

@langner
Copy link
Member

langner commented Aug 20, 2017

Now I'm unsure of what to do about this PR, since I don't want to make @jevandezande redo a lot of this work. Should we figure out the new structure before merging this, or take the "best result" and alter the structure to use the new API later?

Well, I'd also like this particular output to be supported, since it's needed. I'm OK with any of the temporary solutions we outlined above, new attribute or a dictionary - please implement what you guys think is most appropriate. But, could you please add a comment that it is likely to change in the future. I don't think it'll happen very soon, but there's a change we'll get to 2.x in 2018.

@langner langner modified the milestones: v1.5.2, v1.5.3 Aug 25, 2017
@langner
Copy link
Member

langner commented Sep 6, 2017

@jevandezande Any idea if you'll have the time to work on this some time soon, like in September?

@jevandezande
Copy link
Contributor Author

jevandezande commented Sep 6, 2017 via email

@jevandezande
Copy link
Contributor Author

I've added a comment about transprop being liable to change in cclib 2.0 (boring talk at the conference). Is this where you would like the comment? What else do I need to do to before we can get this pull request merged in?

@berquist
Copy link
Member

berquist commented Oct 9, 2017

I'm happy with things as they are now. @langner?

@langner langner self-requested a review October 12, 2017 00:40
@langner
Copy link
Member

langner commented Oct 12, 2017

I'm OK with this as well. I'll leave the clicking to Eric.

@berquist berquist merged commit e34a5d9 into cclib:master Oct 13, 2017
@berquist
Copy link
Member

Thanks for tolerating us, I'm glad we have a contributor that's making us think about more complicated stuff.

@jevandezande jevandezande deleted the orca_spectra branch October 16, 2017 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants