-
-
Notifications
You must be signed in to change notification settings - Fork 107
Description
-
- What does this package do? (explain in 50 words or less)
Provides tokenizers for natural language.
-
- Paste the full DESCRIPTION file inside a code block below.
Package: tokenizers
Type: Package
Title: Tokenize Text
Version: 0.1.1
Date: 2016-04-03
Description: Convert natural language text into tokens. The tokenizers have a
consistent interface and are compatible with Unicode, thanks to being built
on the 'stringi' package. Includes tokenizers for shingled n-grams, skip
n-grams, words, word stems, sentences, paragraphs, characters, lines, and
regular expressions.
License: MIT + file LICENSE
LazyData: yes
Authors@R: c(person("Lincoln", "Mullen", role = c("aut", "cre"),
email = "lincoln@lincolnmullen.com"),
person("Dmitriy", "Selivanov", role = c("ctb"),
email = "selivanov.dmitriy@gmail.com"))
URL: https://github.com/lmullen/tokenizers
BugReports: https://github.com/lmullen/tokenizers/issues
RoxygenNote: 5.0.1
Depends:
R (>= 3.1.3)
Imports:
stringi (>= 1.0.1),
Rcpp (>= 0.12.3),
SnowballC (>= 0.5.1)
LinkingTo: Rcpp
Suggests: testthat
-
- URL for the package (the development repository, not a stylized html page)
https://github.com/lmullen/tokenizers
-
- What data source(s) does it work with (if applicable)?
Natural language text
-
- Who is the target audience?
Users of R packages for NLP
-
- Are there other R packages that accomplish the same thing? If so, what is different about yours?
Virtually every R package for NLP implements a couple of tokenizers. The point of this package is to collect all the tokenizers that one could conceivably want to use in a single package, and make sure that all the packages have a consistent interface. The package also aims to have fast and correct tokenizers implemented on top of stringi and Rcpp.
-
- Check the box next to each policy below, confirming that you agree. These are mandatory.
- This package does not violate the Terms of Service of any service it interacts with.
- The repository has continuous integration with Travis CI and/or another service
- The package contains a vignette
- The package contains a reasonably complete README with
devtools
install instructions - The package contains unit tests
- The package only exports functions to the NAMESPACE that are intended for end users
-
- Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
- Are there any package dependencies not on CRAN?
- Do you intend for this package to go on CRAN?
- Does the package have a CRAN accepted license?
- Did
devtools::check()
produce any errors or warnings? If so paste them below.
-
- Please add explanations below for any exceptions to the above:
The package does not contain a vignette, but it does contain an extensive README. Since all the tokenizers work in the same basic way, a vignette seems unnecessary. But if rOpenSci wants one, I can easily adapt the README into a standalone vignette.
This package is already on CRAN.
-
- If this is a resubmission following rejection, please explain the change in circumstances.