Add ability to pass user-specified 1D data sets to PDB file output. #724

drroe · 2019-05-20T18:28:14Z

Should address #721. This allows e.g. printing PDB files with B-factors calculated from atomicfluct among other things. Adds several new keywords to PDB trajectory output:

	bfacdata <set> : Use data in <set> for B-factor column.
	occdata <set>  : Use data in <set> for occupancy column.
	bfacbyres      : If specified assume X values in B-factor data set are residue numbers.
	occbyres       : If specified assume X values in occupancy data set are residue numbers.
	bfacscale      : If specified scale values in B-factor column between 0 and <bfacmax>.
	occscale       : If specified scale values in occupancy column between 0 and <occmax>.
	bfacmax <max>  : Max value for bfacscale.
	occmax <max>   : Max value for occscale.

The following is an example of how it can be used with atomicfluct:

trajin ../tz2.nc
rms :2-12 first
atomicfluct A0 :2-12 bfactor
average crdset MyAvg
run
crdout MyAvg fluct.2.pdb bfacdata A0

Also adds tests and updates the manual.

…list properly passed in to output traj setup routines.

… can potentially be used with the occupancy column

drroe · 2019-05-20T18:56:44Z

Jenkins failure is pytraj-related; the call to AddTrajout needs to be updated. I can try to open up a concurrent PR for pytraj that fixes this, but it's tricky because that PR won't pass until this PR is merged, so the order of operations is a little complicated.

hainm · 2019-05-20T20:18:25Z

the call to AddTrajout needs to be updated.

is it because of below changes

- int TrajoutList::AddTrajout(std::string const& filename, ArgList const& argIn, Topology* tParm)
+ int TrajoutList::AddTrajout(std::string const& filename, ArgList const& argIn, DataSetList const& DSLin, Topology* tParm)

I have several questions (just for discussion, I can have pytraj follow cpptraj's change).

Does c++ have optional argument?
Can cpptraj have both function with/without DSLin argument if that's not complicated?

drroe · 2019-05-20T22:54:57Z

Does c++ have optional argument?

Yes, they're called "default" arguments I think. So you can e.g. define something like this:

int myfunction(int x, int y, int z = 0) {
  printf("X=%i Y=%i Z=%i\n", x, y, z);
}

and calling like so

myfunction(10, 20);

produces X=10 Y=20 Z=0 while calling like so

myfunction(10,20,30);

produces X=10 Y=20 Z=30

Can cpptraj have both function with/without DSLin argument if that's not complicated?
It's not, and I can. My original logic for not allowing this was if dataset-related arguments are passed in via the ArgList and a blank DataSetList is given, the arguments will fail when maybe they shouldn't. However, maybe I'm overthinking it. I'll look at it again tomorrow with fresh eyes.

hainm · 2019-05-21T04:01:59Z

However, maybe I'm overthinking it. I'll look at it again tomorrow with fresh eyes.

yeah, please. I have impression that this is too much (87 files changed) for adding new data to b-factor column. :D (of course, I have very shallow judge just from the description + the number of files :D). g9 bro.

drroe · 2019-05-21T12:10:49Z

I have impression that this is too much (87 files changed) for adding new data to b-factor column.

So there are two reasons there were so many changes. First, letting output trajectories know about data sets is necessary to get the B-factor etc functionality working. Since output trajectories are used all over the place, this involves modifying a lot of function calls. I would probably be doing myself a favor by wrapping all of those arguments into a class (similar to how I made a ActionInit etc. classes that wrap arguments to Actions to make it easier to add/remove arguments to the various interface functions. It may be that it's worth doing this time around but will involve lots of work on my end (and maybe yours), with the end benefit that future changes shouldn't break things so badly. So maybe that's for a future PR.

The second is that I'm experimenting with using forward declarations as a way of speeding up compile time. This involves reorganizing a lot of include directives, which is why there are so many header changes. It did actually improve the compile time by a couple of seconds, so going forward I'll probably make more use of them.

…tList to maintain pytraj compatibility.

hainm · 2019-05-21T13:02:14Z

Thanks for the explanation.

make[1]: Entering directory '/iscratch/jenkins-cuda/workspace/amber-github/cpptraj/test' -------------------------------------------------------------------------- The value of the MCA parameter "plm_rsh_agent" was set to a path that could not be found: plm_rsh_agent: ssh : rsh Please either unset the parameter, or check that the path is correct

drroe · 2019-05-21T14:52:34Z

This pull request introduces 3 alerts when merging 10d0e35 into 8a604ef - view on LGTM.com

new alerts:

3 for FIXME comment

Comment posted by LGTM.com

…uture PR.

drroe · 2019-05-21T16:49:55Z

This pull request introduces 3 alerts when merging fbee75a into 8a604ef - view on LGTM.com

new alerts:

3 for FIXME comment

Comment posted by LGTM.com

drroe · 2019-05-21T16:53:10Z

@hainm for now, I've re-introduced the old form of AddTrajout() for pytraj. It will be good if in the future pytraj can support the dataset related arguments though.

hainm · 2019-05-21T16:57:49Z

thanks @drroe

slochower · 2020-03-26T18:02:41Z

Did this feature (e.g., the addition of bfacdata) make it into any of the AmberTools 19.x releases on conda? Following the above example, I can create the bfactor data set but trying to write the data to a PDB file doesn't work -- the field is all zeros. I'm basically trying the snippet from the first post verbatim (#724 (comment)) and wondering whether bfacdata is just not implemented yet (I haven't tried to compile from git. I'm using cpptraj version 4.14.0 and AmberTools 19.09).

rms :90-1650 first
atomicfluct A0 :90-1650 bfactor out bfactor.dat
# I can confirm the data set is in `bfactor.dat`
average crdset MyAvg
run
crdout MyAvg xxx-protein-ligand-bfactor.pdb bfacdata A0

drroe · 2020-03-26T19:34:45Z

Did this feature (e.g., the addition of bfacdata) make it into any of the AmberTools 19.x releases on conda?
I'm using cpptraj version 4.14.0 and AmberTools 19.09).

Unfortunately this feature didn't make it in until 4.14.4. If possible, you can use the version direct from GitHub to get this functionality right away. I'm not sure what the timeline is for releasing the next version on conda is.

slochower · 2020-03-26T19:53:38Z

Got it -- thanks. I tried to quickly compile a version I cloned from GitHub earlier today, but immediately ran into problems with BZLIB missing. I thought installing bzip2 from conda-forge would fix this, but it didn't and I didn't troubleshoot further. I'll just wait until this gets integrated into AmberTools unless there's a simple fix to bring in dependencies (N.B. I'm working on a cluster where I have both a system-wide AMBER installation and a local conda-based installation of AmberTools, but I can't easily install system packages.)

drroe · 2020-03-26T22:45:25Z

ran into problems with BZLIB missing.

If you don't need to read bzip2 files natively just configure with -nobzlib

slochower · 2020-03-27T00:04:33Z

If you don't need to read bzip2 files natively just configure with -nobzlib

Of course, whoops. Yup, this worked as intended and I tested the script and B-factors are being written properly. Thanks!

Despite doing the RMS fit, I get quite high values, but they seem suitable for my purposes (many in the range 20-50, but I do see some values near ~1000, and it looks like things just start to increase towards the end of the C-terminus -- which makes sense as that is definitely flopping around in the simulation -- but I still wouldn't have expected such high numbers).

drroe · 2020-03-27T18:44:42Z

but I still wouldn't have expected such high numbers

It's important to keep in mind that these aren't really "B-factors" in the standard (i.e. crystallographic) sense. Unless you're doing crystal simulations, the environment in which you would get experimental B-factors likely differs quite a bit from the environment you're running your MD simulations in. The numbers you're getting out of this calculation are just normalized atomic fluctuations. For proteins in solution comparing to something like NMR order parameters (if they're available) is a better bet.

slochower · 2020-03-27T23:45:11Z

Great point & I agree. Thanks, Dan.

Daniel R. Roe added 22 commits May 17, 2019 13:59

DRR - Cpptraj: Start using forward declares.

17ce9a8

DRR - Cpptraj: Test adding DataSetList to processWriteArgs

40a4ef7

DRR - Add bfacdata keyword.

c31018f

DRR - Cpptraj: Update definition of processWriteArgs

f126354

DRR - Cpptraj: Fix dependencies for TrajectoryIO classes

cbfb937

DRR - Cpptraj: Finish fixing up dependencies. Ensure master data set …

342a374

…list properly passed in to output traj setup routines.

Merge branch 'master' into pdb-custom-bfactor

9754a72

DRR - Cpptraj: Add printout for b-factor data

5234894

DRR - Cpptraj: Test for writing bfacdata

ea743b5

DRR - Cpptraj: Change variable name

1545d72

DRR - Cpptraj: Split out DataSet assignment into separate function so…

d55dd09

… can potentially be used with the occupancy column

DRR - Cpptraj: Add 'occdata' keyword and clean up output

03cfb43

DRR - Cpptraj: Add occdata test

93710e2

DRR - Cpptraj: Add bfacscale and occscale keywords

a5b8bd0

DRR - Cpptraj: Add occmax and bfacmax keywords.

b323470

DRR - Cpptraj: Add test for scaling

24ef753

Merge branch 'master' into pdb-custom-bfactor

85fa406

DRR - Cpptraj: Update manual.

53d47c2

DRR - Cpptraj: Revision bump for new PDB format keywords bfacdata etc

886f853

DRR - Cpptraj: Add bfacbyres and occbyres keywords.

88e0967

DRR - Cpptraj: Test bfacbyres

cdef02c

DRR - Cpptraj: Update manual

817725c

drroe added the enhancement label May 20, 2019

drroe self-assigned this May 20, 2019

DRR - Cpptraj: Use more forward declarations

a0b1bd4

Daniel R. Roe added 4 commits May 21, 2019 08:33

DRR - Cpptraj: More forward declare stuff

8b4d1c8

DRR - Cpptraj: Add back old version of AddTrajout() with blank DataSe…

3ca7ae8

…tList to maintain pytraj compatibility.

DRR - Cpptraj: More forward declarations.

63e235f

DRR - Cpptraj: Forward declarations.

f6600c5

Daniel R. Roe added 5 commits May 21, 2019 09:11

DRR - Cpptraj: More forward declarations

4247bda

DRR - Cpptraj: One more round of dependency juggling

957ae4a

DRR - Cpptraj: Add missing includes for MPI build

1af375a

DRR - Cpptraj: Attempt to add parallel netcdf to jenkins MPI build.

cc9960b

DRR - Cpptraj: Give up on pnetcdf test for now. Will try again in a f…

fbee75a

…uture PR.

drroe merged commit a7d1611 into Amber-MD:master May 21, 2019

drroe deleted the pdb-custom-bfactor branch May 21, 2019 16:53

drroe mentioned this pull request May 22, 2019

Allow data sets to be used in the B-factor/occupancy column of PDB files. #721

Closed

drroe mentioned this pull request Jun 3, 2019

Modifications for ambpdb #728

Merged

Add ability to pass user-specified 1D data sets to PDB file output. #724

Add ability to pass user-specified 1D data sets to PDB file output. #724

Uh oh!

Conversation

drroe commented May 20, 2019

Uh oh!

drroe commented May 20, 2019

Uh oh!

hainm commented May 20, 2019

Uh oh!

drroe commented May 20, 2019

Uh oh!

hainm commented May 21, 2019

Uh oh!

drroe commented May 21, 2019

Uh oh!

hainm commented May 21, 2019 via email

Uh oh!

drroe commented May 21, 2019

Uh oh!

drroe commented May 21, 2019

Uh oh!

drroe commented May 21, 2019

Uh oh!

hainm commented May 21, 2019

Uh oh!

slochower commented Mar 26, 2020

Uh oh!

drroe commented Mar 26, 2020

Uh oh!

slochower commented Mar 26, 2020

Uh oh!

drroe commented Mar 26, 2020

Uh oh!

slochower commented Mar 27, 2020

Uh oh!

drroe commented Mar 27, 2020

Uh oh!

slochower commented Mar 27, 2020

Uh oh!

Uh oh!