Skip to content

Conversation

lcoombe
Copy link
Member

@lcoombe lcoombe commented Oct 7, 2024

  • If simulated read longer than selected chromosome, select from same species preferentially
  • Previous behaviour randomly selected a different sequence from the full pool of species
  • In the example of the Zymo mock model, this meant that S. aureus reads ended up under-represented, while Cryptococcus was over-represented
    • This was because the S. aureus reference contained 4 sequences - the main circular genome and 3 short plasmids
    • When a sequence from S. aureus was randomly selected, frequently the plasmids were chosen, which were shorter than the requested read length
    • Because there are more Cryptococcus sequences in the reference compared with the number of sequences in other references, randomly choosing a replacement sequence from the entire pool of species/sequences meant that Cryptococcus was chosen more frequently
  • To retain the requested abundances as much as possible, preferentially choose the 'alternative' sequence from the same species
    • This is consistent with the version used in the meta-NanoSim paper
  • As a fall-back, if there are no appropriate sequences in the species' reference, choose another species
    • If this is required, a warning will be printed, advising the user to check the abundances after simulation finishes

@lcoombe lcoombe merged commit b49fb03 into master Oct 7, 2024
@lcoombe lcoombe deleted the meta_abun_fix branch October 7, 2024 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant