Skip to content

Conversation

drroe
Copy link
Contributor

@drroe drroe commented May 20, 2019

Should address #721. This allows e.g. printing PDB files with B-factors calculated from atomicfluct among other things. Adds several new keywords to PDB trajectory output:

	bfacdata <set> : Use data in <set> for B-factor column.
	occdata <set>  : Use data in <set> for occupancy column.
	bfacbyres      : If specified assume X values in B-factor data set are residue numbers.
	occbyres       : If specified assume X values in occupancy data set are residue numbers.
	bfacscale      : If specified scale values in B-factor column between 0 and <bfacmax>.
	occscale       : If specified scale values in occupancy column between 0 and <occmax>.
	bfacmax <max>  : Max value for bfacscale.
	occmax <max>   : Max value for occscale.

The following is an example of how it can be used with atomicfluct:

trajin ../tz2.nc
rms :2-12 first
atomicfluct A0 :2-12 bfactor
average crdset MyAvg
run
crdout MyAvg fluct.2.pdb bfacdata A0

Also adds tests and updates the manual.

@drroe drroe self-assigned this May 20, 2019
@drroe
Copy link
Contributor Author

drroe commented May 20, 2019

Jenkins failure is pytraj-related; the call to AddTrajout needs to be updated. I can try to open up a concurrent PR for pytraj that fixes this, but it's tricky because that PR won't pass until this PR is merged, so the order of operations is a little complicated.

@hainm
Copy link
Contributor

hainm commented May 20, 2019

the call to AddTrajout needs to be updated.

is it because of below changes

- int TrajoutList::AddTrajout(std::string const& filename, ArgList const& argIn, Topology* tParm)
+ int TrajoutList::AddTrajout(std::string const& filename, ArgList const& argIn, DataSetList const& DSLin, Topology* tParm)

I have several questions (just for discussion, I can have pytraj follow cpptraj's change).

  • Does c++ have optional argument?
  • Can cpptraj have both function with/without DSLin argument if that's not complicated?

@drroe
Copy link
Contributor Author

drroe commented May 20, 2019

Does c++ have optional argument?

Yes, they're called "default" arguments I think. So you can e.g. define something like this:

int myfunction(int x, int y, int z = 0) {
  printf("X=%i Y=%i Z=%i\n", x, y, z);
}

and calling like so

myfunction(10, 20);

produces X=10 Y=20 Z=0 while calling like so

myfunction(10,20,30);

produces X=10 Y=20 Z=30

Can cpptraj have both function with/without DSLin argument if that's not complicated?
It's not, and I can. My original logic for not allowing this was if dataset-related arguments are passed in via the ArgList and a blank DataSetList is given, the arguments will fail when maybe they shouldn't. However, maybe I'm overthinking it. I'll look at it again tomorrow with fresh eyes.

@hainm
Copy link
Contributor

hainm commented May 21, 2019

However, maybe I'm overthinking it. I'll look at it again tomorrow with fresh eyes.

yeah, please. I have impression that this is too much (87 files changed) for adding new data to b-factor column. :D (of course, I have very shallow judge just from the description + the number of files :D). g9 bro.

@drroe
Copy link
Contributor Author

drroe commented May 21, 2019

I have impression that this is too much (87 files changed) for adding new data to b-factor column.

So there are two reasons there were so many changes. First, letting output trajectories know about data sets is necessary to get the B-factor etc functionality working. Since output trajectories are used all over the place, this involves modifying a lot of function calls. I would probably be doing myself a favor by wrapping all of those arguments into a class (similar to how I made a ActionInit etc. classes that wrap arguments to Actions to make it easier to add/remove arguments to the various interface functions. It may be that it's worth doing this time around but will involve lots of work on my end (and maybe yours), with the end benefit that future changes shouldn't break things so badly. So maybe that's for a future PR.

The second is that I'm experimenting with using forward declarations as a way of speeding up compile time. This involves reorganizing a lot of include directives, which is why there are so many header changes. It did actually improve the compile time by a couple of seconds, so going forward I'll probably make more use of them.

@hainm
Copy link
Contributor

hainm commented May 21, 2019 via email

Daniel R. Roe added 5 commits May 21, 2019 09:11
make[1]: Entering directory '/iscratch/jenkins-cuda/workspace/amber-github/cpptraj/test'
  --------------------------------------------------------------------------
The value of the MCA parameter "plm_rsh_agent" was set to a path
that could not be found:

  plm_rsh_agent: ssh : rsh

Please either unset the parameter, or check that the path is correct
@drroe
Copy link
Contributor Author

drroe commented May 21, 2019

This pull request introduces 3 alerts when merging 10d0e35 into 8a604ef - view on LGTM.com

new alerts:

  • 3 for FIXME comment

Comment posted by LGTM.com

@drroe
Copy link
Contributor Author

drroe commented May 21, 2019

This pull request introduces 3 alerts when merging fbee75a into 8a604ef - view on LGTM.com

new alerts:

  • 3 for FIXME comment

Comment posted by LGTM.com

@drroe
Copy link
Contributor Author

drroe commented May 21, 2019

@hainm for now, I've re-introduced the old form of AddTrajout() for pytraj. It will be good if in the future pytraj can support the dataset related arguments though.

@drroe drroe merged commit a7d1611 into Amber-MD:master May 21, 2019
@drroe drroe deleted the pdb-custom-bfactor branch May 21, 2019 16:53
@hainm
Copy link
Contributor

hainm commented May 21, 2019

thanks @drroe

@slochower
Copy link
Contributor

Did this feature (e.g., the addition of bfacdata) make it into any of the AmberTools 19.x releases on conda? Following the above example, I can create the bfactor data set but trying to write the data to a PDB file doesn't work -- the field is all zeros. I'm basically trying the snippet from the first post verbatim (#724 (comment)) and wondering whether bfacdata is just not implemented yet (I haven't tried to compile from git. I'm using cpptraj version 4.14.0 and AmberTools 19.09).

rms :90-1650 first
atomicfluct A0 :90-1650 bfactor out bfactor.dat
# I can confirm the data set is in `bfactor.dat`
average crdset MyAvg
run
crdout MyAvg xxx-protein-ligand-bfactor.pdb bfacdata A0

@drroe
Copy link
Contributor Author

drroe commented Mar 26, 2020

Did this feature (e.g., the addition of bfacdata) make it into any of the AmberTools 19.x releases on conda?
I'm using cpptraj version 4.14.0 and AmberTools 19.09).

Unfortunately this feature didn't make it in until 4.14.4. If possible, you can use the version direct from GitHub to get this functionality right away. I'm not sure what the timeline is for releasing the next version on conda is.

@slochower
Copy link
Contributor

Got it -- thanks. I tried to quickly compile a version I cloned from GitHub earlier today, but immediately ran into problems with BZLIB missing. I thought installing bzip2 from conda-forge would fix this, but it didn't and I didn't troubleshoot further. I'll just wait until this gets integrated into AmberTools unless there's a simple fix to bring in dependencies (N.B. I'm working on a cluster where I have both a system-wide AMBER installation and a local conda-based installation of AmberTools, but I can't easily install system packages.)

@drroe
Copy link
Contributor Author

drroe commented Mar 26, 2020

ran into problems with BZLIB missing.

If you don't need to read bzip2 files natively just configure with -nobzlib

@slochower
Copy link
Contributor

If you don't need to read bzip2 files natively just configure with -nobzlib

Of course, whoops. Yup, this worked as intended and I tested the script and B-factors are being written properly. Thanks!

Despite doing the RMS fit, I get quite high values, but they seem suitable for my purposes (many in the range 20-50, but I do see some values near ~1000, and it looks like things just start to increase towards the end of the C-terminus -- which makes sense as that is definitely flopping around in the simulation -- but I still wouldn't have expected such high numbers).

@drroe
Copy link
Contributor Author

drroe commented Mar 27, 2020

but I still wouldn't have expected such high numbers

It's important to keep in mind that these aren't really "B-factors" in the standard (i.e. crystallographic) sense. Unless you're doing crystal simulations, the environment in which you would get experimental B-factors likely differs quite a bit from the environment you're running your MD simulations in. The numbers you're getting out of this calculation are just normalized atomic fluctuations. For proteins in solution comparing to something like NMR order parameters (if they're available) is a better bet.

@slochower
Copy link
Contributor

Great point & I agree. Thanks, Dan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants