-
Notifications
You must be signed in to change notification settings - Fork 67
Expand mask syntax #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand mask syntax #548
Conversation
…Modify ArgList::ArgIsMask so that it looks for characters which are valid to start masks, not just mask characters in a string.
… Update dependencies.
…ill add option to return to original behavior later.
… distance selection by molecule
@drroe let me have a bit thinking first. |
Just quick question. Will you update mask selection for other programs in AT too? then ParmEd needs to update too. |
Saying
Overall, I think the change is pretty good. Will add some testing for pytraj for sure. |
I'll respond when I'm on my computer tonight. |
What Chimera does here is :: for chains, which is a scheme familiar to a non-negligible crowd of users and from their experience seems to have been working fine for quite some time. Any reason not to do the same? This could actually double as the molecule selector where integers select molecule numbers and letters select chains. The arguments for this approach are
The arguments against this approach are
If the arguments against are more persuasive, I think ^ is a reasonable character to use for molecules. In fact, I think point 3 above has pushed me away from this idea anyway (and in support of chain #). However, I do like :: better than :/ for chain selection. The former is what Chimera does and the latter will be rendered into an emoji whenever you try to send it to someone 😏
What would happen if someone used this syntax on a prmtop file that wasn't modified via As a general question, what happens when selecting based on information that doesn't exist? (Chain, original residue number, ... not sure if I'm missing one.) |
Thanks @hainm and @swails for the feedback! I'll address concerns in order.
I will try to make this a priority, I don't have a lot of time for development these days. There are quite a few programs that would need to be updated: parmed, sander/ambmask, pmemd (feels like there should be a common parser for those two), mdgx, nab, and probably some I'm missing. I'd at the very least create an Issue or bug report asking for these to be added :-).
Yes, it does mean that, it does work, and I'll add a test.
There is already a test for distance selection by molecule in Test_DistBasedMask (unless I'm not understanding your meaning).
I did try to implement this initially, and to make a long story short the way the cpptraj mask tokenizer currently works (character by character) this is not straightforward to do. I'd have to introduce some extra logic that says "if this character is
I would go farther and say that they're very different. In a non-insignificant fraction of PDBs I have encountered, chains span multiple molecules (for example in dimers/tetramers etc all molecules that make up a monomer are often given the same chain ID).
For chain ID, you'd get no atoms selected. For original residue number, if it's not present cpptraj fills it in with 1-based ordering (so e.g. residue number and original residue number are exactly the same for Amber topology files). To to summarize it seems that |
I mean "does this select CA atoms of molecule within 3 A from atom 5?" |
No, but to be fair it doesn't mean that for |
Turns out the tokenizer modification wasn't as bad as I thought. Chain ID selection is now |
No I have not. |
In that case I'll merge now - easy enough to change behavior back later if needed. |
At long last, trying to address #71.
This PR expands cpptraj's mask selection syntax by introducing the following:
^
character, e.g.would select residues named
ASP
in molecules 1 and 2. I decided on^
because it's highly unlikely to be used in an atom/residue name and it had no overlap with existing operators/operands. @swails, @hainm, @dacase, can you think of any reasons why not to use^
here?^
in distance-based mask instead of:
or@
, e.g.would select all molecules within 3.0 Angstroms of atom 5.
/
modifier for residue:
, e.g.would select all residues with chain ID
B
.;
modifier for residue:
, e.g.would select all residues originally numbered 2-4 and 8.
This also includes a slight change to how argument lists search for masks. Previously an argument could be considered a mask if it contained
@
,:
, or*
anywhere in the argument string. The new method only considers an argument string a mask if it contains@ : ^ *
or=
after any leading!
and(
characters. This is more robust, but can lead to situations where bad masks are ignored and effectively changed to*
. For example, with previous methodselect test@
fails withTokenize: Wrong syntax
becausetest@
is treated as a mask, but with the new methodtest@
is ignored andselect
assumes no mask present and selects everything. Not sure which behavior I like better and would be interested in opinions from @swails and @hainm.This PR also includes some cleanup of the MaskToken and MaskTokenArray classes, and adds tests for the new syntax.