ORCA TDDFT and ROCIS Parsing #398

jevandezande · 2017-07-21T08:34:00Z

This pull request updates the parsing of TDDFT and adds additional parsing for ROCIS. At the moment it parses all of the spectra, consistently overwriting until the final spectra is parsed. This is obviously less than optimal behavior, and can take a few seconds for larger systems.

Questions

Should I create a new set of regressions for ROCIS? It is very similar to TDDFT, but I need an open shell case to produce all the various different types of results to be printed.
What should be done about all of the different results being parsed? Just taking the last one currently breaks the tests.

TODO

Fix regression tests

Status

Ready to go

Increases the amount of the TD header checked. Adds ROCIS test cases (currently not integrated into unit tests).

langner · 2017-07-21T16:52:58Z

First of all, thanks for the contribution! Our convention has mostly been to parse the last (best?) results. So in this situation we would overwrite the attribute just as you've done, and add the regressions you need to cover ROCIS cases.

But, I would defer to Eric about the decision making here, I think he has more context and knowledge about these types of calculations. @berquist do you think parsing the final result is good here, or do we need to consider parsing multiple spectra for some reason?

Concerning the regression you broke - is that also ROCIS, causing it to fail?

berquist · 2017-07-24T04:05:37Z

Should I create a new set of regressions for ROCIS? It is very similar to TDDFT, but I need an open shell case to produce all the various different types of results to be printed.

What we do for unrestricted test cases is take the same divinylbenzene geometry from restricted cases and make the cation doublet. This may be an appropriate time to start incrementally making restricted open-shell tests. See more on this later.

What should be done about all of the different results being parsed? Just taking the last one currently breaks the tests.

See the next paragraph.

But, I would defer to Eric about the decision making here, I think he has more context and knowledge about these types of calculations. @berquist do you think parsing the final result is good here, or do we need to consider parsing multiple spectra for some reason?

I think that for now, we should test against results comparable to what we already have, which is ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS, the origin-dependent length form, so the first result. This should maintain consistency for the current attributes across all programs, though I am not sure if some programs currently parse the velocity form rather than the length form (Gaussian?).

However, I would like to expand the types of properties we can parse, since multiple programs can print multiple forms. See a related old PR #227 that parses electric length, electric velocity, and magnetic transition dipole moments from Gaussian. This is good incentive to figure out how these less general attributes or properties should be stored.

This is (to me) just a ROTDDFT calculation with the TDA/CIS approximation and scaling factors for some integral terms. I think this test can be made more general (across parsers) if the molecule is switched from a boron atom to DVB, and the rest of the input is changed to something like

! roks b3lyp sto-3g nori noautostart

% rocis
    nroots  5
    maxcore 2500
    maxdim  50
    dori    false
    soc     true
    dohighermult true
    dodftcis   	true
    doquad      true
    dftcis_c = 0.21, 0.49, 0.29
    orbwin = 0, 0, 0, 2000
end

* xyz 1 2
 C                  0.27867948   -1.36683162    0.00000000
 C                  1.32303041   -0.44173575    0.00000000
 C                  1.04434506    0.92484978    0.00000000
 C                 -0.27867948    1.36683162    0.00000000
 C                 -1.32303041    0.44173575    0.00000000
 C                 -1.04434506   -0.92484978    0.00000000
 H                  2.36595443   -0.79037726    0.00000000
 H                  1.86746094    1.65407997    0.00000000
 H                 -2.36595443    0.79037726    0.00000000
 H                 -1.86746094   -1.65407997    0.00000000
 C                 -0.58659169    2.87589931    0.00000000
 C                  0.36350188    3.80076420    0.00000000
 H                 -1.65647768    3.12394312    0.00000000
 H                  0.14429560    4.87693235    0.00000000
 H                  1.43338788    3.55272039    0.00000000
 C                  0.58659169   -2.87589931    0.00000000
 C                 -0.36350188   -3.80076420    0.00000000
 H                  1.65647768   -3.12394312    0.00000000
 H                 -0.14429560   -4.87693235    0.00000000
 H                 -1.43338788   -3.55272039    0.00000000
*

At least DALTON, Q-Chem, I think NWChem, and presumably Gaussian can perform ROCIS/ROTDDFT calculations, and I think we should try and write a common test for all those parsers. Without running calculations it isn't obvious to me how sensitive the results will be to changing the dftcis_c parameters, which would affect if a single common test could be written with reasonable error bars. Maybe @jevandezande could shed some light on this.

Fix regression tests

It's only the old unit tests now in the regression suite, so whatever the code fix is will fix both the unit and regression tests.

berquist · 2017-07-24T04:10:14Z

src/cclib/parser/orcaparser.py

-                self.etoscs.append(osc)
+
+        # Parse the various absorption spectra for TDDFT and ROCIS
+        if 'ABSORPTION SPECTRUM' in line or 'ELECTRIC DIPOLE' in line:


Would anything fail if this was changed to only if 'ABSORPTION SPECTRUM' in line:?

Yep, the quadrupole corrections do not mention ABSORPTION SPECTRUM. They usually look like: COMBINED ELECTRIC DIPOLE + MAGNETIC DIPOLE + ELECTRIC QUADRUPOLE SPECTRUM.
Also, the X-Ray emission spectrum would be missed.

Ok, that makes sense.

berquist · 2017-07-24T04:12:31Z

src/cclib/parser/orcaparser.py

+                return energy, intensity
+
+            # Check for variations
+            if (line[16:24] == 'COMBINED' and not 'ROCIS' in line) or \


Is there a reason why this isn't the full string, say if line.strip() == 'COMBINED ELECTRIC DIPOLE + MAGNETIC DIPOLE + ELECTRIC QUADRUPOLE SPECTRUM'? Same for the others. If not, this is harder to see the logic for.

Yeah, maybe that is a better way to do it for most of them.

berquist · 2017-07-24T04:13:01Z

src/cclib/parser/orcaparser.py

+   2 61793079.3      0.2      0.00000         0.00000         2.85949         0.00000        -0.00000         0.00000285948800         0.00000         0.00000         1.00000         0.00000         -0.00000
+"""
+                    try:
+                        state, energy, wavelength, d2, m2, q2, dm, do, intensity, d2_contrib, m2_contrib, q2_contrib, dm_contrib, do_contrib = line.split()


Can this be broken up into multiple lines?

This was the most compact yet readable format that I could think of. I'll see what I can do with multiple lines.

langner · 2017-07-24T04:53:10Z

Thanks for the input Eric, I totally missed the connection to the gauge thing.

jevandezande · 2017-07-24T09:23:43Z

The DFT/ROCIS I added is basically just ROCIS with DFT orbitals and some fudge factors. The scaling parameters included are the defaults; they have been optimized for B3LYP/def2-tzvp(-f) (Roemelt, M.; Neese, F. (2013) J. Phys. Chem. A, 117, 3069). If we wanted to simplify the test input, we could drop the DFT part and do standard ROCIS, which prints the exact same way. Since DVB is symmetric, the targeted core orbital must be localized, or the orbital window (OrbWin) needs to be expanded to include all symmetry equivalent orbitals.

I think saving all forms akin to PR #227 would be a great idea. This would make it easier to compare the effects of different approximations and also help protect the user from accidently grabbing the wrong absorption spectrum (that happened to me, which is what prompted the rewrite).

As a side note, such a module might be nice for thermochemistry, where there are a lot of small pieces that are spit out that someone may want to look at.

berquist · 2017-08-14T23:05:35Z

Ok, sorry for letting this sit for so long.

If we wanted to simplify the test input, we could drop the DFT part and do standard ROCIS, which prints the exact same way.

As long as the printing is exactly the same way, I'd like this, so it gives us a chance of reusing the test for other parsers. I discovered that Q-Chem can't do ROCIS, so this may be less relevant than I wanted.

Since DVB is symmetric, the targeted core orbital must be localized, or the orbital window (OrbWin) needs to be expanded to include all symmetry equivalent orbitals.

Can we get the same output from a non-XAS/valence-only calculation? I don't think other programs can do orbital window selection like ORCA, so they'd have to calculate all possible states, which might be expensive.

I think saving all forms akin to PR #227 would be a great idea. This would make it easier to compare the effects of different approximations and also help protect the user from accidently grabbing the wrong absorption spectrum (that happened to me, which is what prompted the rewrite).

A problem that I see here is that program-specific field names will be almost unavoidable as we move to less-general properties. Suggestions for what these should be named and how they're structured are welcome. If we're going to be program-specific, the overly verbose but safe solution is to use the section headers (X-RAY ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS, etc.) as dictionary keys. That way, it's very clear what's been parsed without looking at the documentation.

If any of the above changes break possible added functionality and the test can't be made general, then I'd prefer the added functionality and an ORCA-specific test.

For now, if there is a section that's clearly identifiable as the preferable one, we should use that, and the general parsing strategy can be changed later when we make breaking changes.

As a side note, such a module might be nice for thermochemistry, where there are a lot of small pieces that are spit out that someone may want to look at.

If you could explain this in more detail (opening another issue for a question/discussion would be good), that would help our discussion about how "custom" or more complex attributes will be made user-facing in the future.

Adds a dictionary containing all spectra computed with TDDFT, ROCIS, etc. Keys are the titles for the spectra. Values are a tuple (etenergies, etoscs).

jevandezande · 2017-08-16T12:31:36Z

I added a transprop dictionary that has the title of each TDDFT, ROCIS, etc. calculation as the key and a tuple of (etenergies, etoscs) as the values. The etenergies and etoscs are then set to the last parsed spectrum. Thoughts?

As for testing, I added DVB with ROCIS and a test case subclassed from GenericTDTest. Then I overwrote most of the tests. I feel like keeping it with the TD test cases makes the most sense.

langner · 2017-08-17T19:09:34Z

A problem that I see here is that program-specific field names will be almost unavoidable as we move to less-general properties. Suggestions for what these should be named and how they're structured are welcome. If we're going to be program-specific, the overly verbose but safe solution is to use the section headers (X-RAY ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS, etc.) as dictionary keys. That way, it's very clear what's been parsed without looking at the documentation.

I think this could be dealt with easily if we expanded our concept of attribute. If used a custom Attribute class we could develop some convention (sub-attributes, lookups, dictionary-like behavior, whatever) that would cover less widespread variants or even program-specific results.

berquist · 2017-08-18T17:45:09Z

Now I'm unsure of what to do about this PR, since I don't want to make @jevandezande redo a lot of this work. Should we figure out the new structure before merging this, or take the "best result" and alter the structure to use the new API later?

berquist · 2017-08-18T17:46:15Z

test/testdata

@@ -236,7 +236,8 @@ TD        Gaussian    GaussianTDDFTTest     basicGaussian09       dvb_td.out
 TD        Jaguar      JaguarTDDFTTest       basicJaguar8.3        dvb_td.out
 TD        ORCA        OrcaTDDFTTest         basicORCA2.9          dvb_td.out
 TD        ORCA        OrcaTDDFTTest         basicORCA3.0          dvb_td.out
-TD        QChem       OrcaTDDFTTest         basicQChem4.2         dvb_td.out
+TD        ORCA        OrcaROCISTest         basicORCA4.0          dvb_rocis.out


@langner I am fine with adding a new unit test as long as you are.

Timeit says reading the file takes 9.9 ms on my laptop.

It's not so much timing as whether or not the test type is general enough, though we are clearly starting to move away from that, and correctness is the most important thing.

jevandezande · 2017-08-18T18:07:48Z

Should the discussion of making a custom Attribute class be moved to a separate issue (it could potentially be used for other attributes)? Currently I only use ROCIS sparingly, so I am in no hurry to merge this code in.

berquist · 2017-08-18T18:14:22Z

Should the discussion of making a custom Attribute class be moved to a separate issue (it could potentially be used for other attributes)? Currently I only use ROCIS sparingly, so I am in no hurry to merge this code in.

I was worried about that, so this is good to hear. I've made a new issue: #419.

langner · 2017-08-20T23:00:30Z

Now I'm unsure of what to do about this PR, since I don't want to make @jevandezande redo a lot of this work. Should we figure out the new structure before merging this, or take the "best result" and alter the structure to use the new API later?

Well, I'd also like this particular output to be supported, since it's needed. I'm OK with any of the temporary solutions we outlined above, new attribute or a dictionary - please implement what you guys think is most appropriate. But, could you please add a comment that it is likely to change in the future. I don't think it'll happen very soon, but there's a change we'll get to 2.x in 2018.

langner · 2017-09-06T08:33:17Z

@jevandezande Any idea if you'll have the time to work on this some time soon, like in September?

jevandezande · 2017-09-06T09:42:47Z

I'm currently at a conference, but should be able to work on stuff again early next week.

…

On Wed, Sep 6, 2017, 10:33 Karol M. Langner ***@***.***> wrote: @jevandezande <https://github.com/jevandezande> Any idea if you'll have the time to work on this some time soon, like in September? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#398 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABzcCM5eA30RifnREK2nGKh_UFVU8vqeks5sfljOgaJpZM4OfHQO> .

jevandezande · 2017-09-07T08:24:02Z

I've added a comment about transprop being liable to change in cclib 2.0 (boring talk at the conference). Is this where you would like the comment? What else do I need to do to before we can get this pull request merged in?

berquist · 2017-10-09T02:53:59Z

I'm happy with things as they are now. @langner?

langner · 2017-10-12T00:41:43Z

I'm OK with this as well. I'll leave the clicking to Eric.

berquist · 2017-10-13T17:09:20Z

Thanks for tolerating us, I'm glad we have a contributor that's making us think about more complicated stuff.

jevandezande added 2 commits July 21, 2017 08:48

First hack at TDDFT and ROCIS absorption spectrum parsing.

800fba8

Improves TD parsing

b8b5224

Increases the amount of the TD header checked. Adds ROCIS test cases (currently not integrated into unit tests).

langner added ORCA parsers labels Jul 21, 2017

langner requested a review from berquist July 21, 2017 16:43

berquist requested changes Jul 24, 2017

View reviewed changes

Made spectra matching easier to read.

86557bc

berquist mentioned this pull request Aug 14, 2017

gaussianparser: et transition dipoles #227

Closed

jevandezande added 2 commits August 15, 2017 16:35

First hack at transprop.

05b3782

Adds a dictionary containing all spectra computed with TDDFT, ROCIS, etc. Keys are the titles for the spectra. Values are a tuple (etenergies, etoscs).

Adds ROCIS test.

0cb502f

berquist reviewed Aug 18, 2017

View reviewed changes

berquist mentioned this pull request Aug 18, 2017

Structure of future attributes #419

Open

langner modified the milestones: v1.5.2, v1.5.3 Aug 25, 2017

jevandezande added 2 commits September 7, 2017 09:20

Adds documentations of transprop.

fd4f5c5

Simplifies parsing of tdfft with DoQuad.

b5c38fa

berquist approved these changes Oct 9, 2017

View reviewed changes

langner self-requested a review October 12, 2017 00:40

langner approved these changes Oct 12, 2017

View reviewed changes

berquist merged commit e34a5d9 into cclib:master Oct 13, 2017

jevandezande deleted the orca_spectra branch October 16, 2017 05:49

ORCA TDDFT and ROCIS Parsing #398

ORCA TDDFT and ROCIS Parsing #398

Uh oh!

Conversation

jevandezande commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Questions

TODO

Status

Uh oh!

langner commented Jul 21, 2017

Uh oh!

berquist commented Jul 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

langner commented Jul 24, 2017

Uh oh!

jevandezande commented Jul 24, 2017

Uh oh!

berquist commented Aug 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jevandezande commented Aug 16, 2017

Uh oh!

langner commented Aug 17, 2017

Uh oh!

berquist commented Aug 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jevandezande commented Aug 18, 2017

Uh oh!

berquist commented Aug 18, 2017

Uh oh!

langner commented Aug 20, 2017

Uh oh!

langner commented Sep 6, 2017

Uh oh!

jevandezande commented Sep 6, 2017 via email

Uh oh!

jevandezande commented Sep 7, 2017

Uh oh!

berquist commented Oct 9, 2017

Uh oh!

langner commented Oct 12, 2017

Uh oh!

berquist commented Oct 13, 2017

Uh oh!

Uh oh!

jevandezande commented Jul 21, 2017 •

edited

Loading

berquist commented Aug 14, 2017 •

edited

Loading