-
-
Notifications
You must be signed in to change notification settings - Fork 107
Description
Summary
- What does this package do? (explain in 50 words or less):
R implementation of the PhyLoTa pipeline. It identifies orthologous sequences in GenBank using BLAST searches. It consists of three automated stages: taxise, download and cluster.
- Paste the full DESCRIPTION file inside a code block below:
Package: phylotaR
Type: Package
Title: Automated phylogenetic sequence cluster identification from GenBank
Version: 0.1
Date: 2017-11-17
Author: Hannes Hettling, Rutger Vos, Alexander Zika, Dominic J. Bennett, Alexandre Antonelli
Maintainer: D.J. Bennett <dominic.john.bennett@gmail.com>
Description: PhyLoTa identifies orthologous sequence clusters from data readily available in GenBank. This is an R implementation of this process.
License: GPL-2
URL: https://github.com/dombennett/phylotaR#readme
BugReports: https://github.com/dombennett/phylotaR/issues
SystemRequirements: BLAST+ (>2.7.1)
Depends:
R (>= 3.3.0),
methods,
foreach
Imports:
curl,
doMC,
doSNOW,
igraph,
CHNOSZ,
rentrez,
DBI,
taxize,
XML
Suggests:
testthat,
knitr,
rmarkdown
RoxygenNote: 6.0.1
VignetteBuilder: knitr
-
URL for the package (the development repository, not a stylized html page): https://github.com/DomBennett/phylotaR
-
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
data retrieval, it downloads sequences from genbank and identifies sequence clusters that may be of use for phylogenetic analysis
- Who is the target audience and what are scientific applications of this package?
Evolutionary biologists, phylogeneticists. Determining sequence orthology is the essential first step of any phylogenetic analysis. Usually a phylogeneticist may search GenBank for sequences of the same gene, download and align. Due to naming problems, however, potential relevant sequences may be missed or sequences that have been mislabelled are downloaded. PhyLoTa gets around this by performing all-v-all BLAST searches and does not rely on gene names. PhyLoTa is often the first step of any large-scale automated phylogenetic pipeline (e.g. SUPERSMART). Potentially, this R package could be central to many future phylogenetic analyses.
- Are there other R packages that accomplish the same thing? If so, how does
yours differ or meet our criteria for best-in-category?
The only current approach for identifying PhyLoTa clusters is the PhyLoTa browser. The data underlying its current release, however, is five years old. In those five years there have been over 40 million new sequences added to GenBank. An alternative is needed to make use of the latest data. Additionally, phylotaR would allow a user to specify their own parameters when running the pipeline.
- If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Requirements
Confirm each of the following by checking the box. This package:
- does not violate the Terms of Service of any service it interacts with.
- has a CRAN and OSI accepted license.
- contains a README with instructions for installing the development version.
- includes documentation with examples for all functions.
- contains a vignette with examples of its essential functions and uses.
- has a test suite.
- has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
- I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
Publication options
- Do you intend for this package to go on CRAN?
- Do you wish to automatically submit to the Journal of Open Source Software? If so:
- The package has an obvious research application according to JOSS's definition.
- The package contains a
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
. - The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)
- Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
- The package is novel and will be of interest to the broad readership of the journal.
- The manuscript describing the package is no longer than 3000 words.
- You intend to archive the code for the package in a long-term repository which meets the requirements of the journal.
- (Please do not submit your package separately to Methods in Ecology and Evolution)
Detail
- Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:
No errors, but plenty of warnings to do with documentation.
- Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
Not just yet. Outstanding: conversion to snake_case, generate README from rmd, create website with pkgdown, NEWS required, authors@R syntax, depends to imports and cat to message (although is that necessary given the logging methods?)
-
If this is a resubmission following rejection, please explain the change in circumstances:
-
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Pre-submission Note
Hi!
I've produced this pre-submission issue to determine whether phylotaR would be something you'd like added to your roster? On our side, we would be very keen as we think future phylogenetic analyses will depend on making software open-source, modular and collaborative. I did feel, however, that it may not quite fit into the ropensci family, because it's a pipeline and it depends on software external to R (BLAST). Please be aware that phylotaR is still only being developed and a lot of testing still needs to be performed. If you are interested, I'd be happy to update the naming style (I prefer camelcase), improve the documentation and whatever else is needed to meet your criteria.
Thanks,
Dom