Skip to content
This repository was archived by the owner on May 19, 2021. It is now read-only.
This repository was archived by the owner on May 19, 2021. It is now read-only.

Packages as research repositories/compendia  #11

@benmarwick

Description

@benmarwick

At last year's rOpenSci event we worked on a short guide to reproducible research, under @iamciera's guidance. Some of the most interesting progress on this topic since then has been on using the R package framework as a research repository or compendium for scholarly work, cf. @rmflight's blog posts, @cboettig's template package, @Pakillo's template package and @jhollist's manuscriptPackage, etc.

The concept of a research compendium has been around for a while (cf. Gentleman 2005, Gentleman & Temple Lang 2007, Stodden 2009, Leisch et al. 2011). Many of us are making custom R packages to accompany our research publications to improve reproducibility, but I think there are a bunch of questions are what are the best ways to do this.

Perhaps at the unconf we can have a discussion to share some of the ways we're using R packages as research compendia, and draft a few guidelines to add to the guide. The goal would be to help domain scientists, especially those who are primarily not tool-developers and already prolific package authors, get started with this. @hadley's book is of course an excellent resource on R packages generally, but using packages as research compendia raises some specialised questions that this ropensci group are uniquely qualified to tackle.

Some of the questions that I'd like to learn more about on this topic include:

  • How best to include data in the package? Or link to data when it's too big to go in the package. Rdata files may be more efficient, but plain text formats are more accessible for reuse in other contexts
  • How best to include the manuscript in the package? The package vignette seems like the obvious choice, but there are some limitations to that, for example, I cannot store the HTML output from the rendered Rmd in there. @cboettig's solution is to have a manuscript directory in the package, which is outside of the regular package framework and needs make to execute.
  • How best to control dependencies on other packages? Should we specify exact versions? Bundle the source of other packages with our package to maximize isolation and protect against changes in other packages that will break ours? Which of the numerous current potential solutions to this problem has the most promise (packrat, rbundler, checkpoint, gRAN, drat, etc.)? Perhaps these questions are a subset of Beyond CRAN: modern dependency management; including older/archived versions & alternative respositories #7
  • How to address dependencies external to R when presenting a package as a stand-alone research repository? A docker image containing the package is one option some of us have pursued (cf. files on the docker image cboettig/nonparametric-bayes#55)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions