Tokenizers package

- 1. What does this package do? (explain in 50 words or less)  

Provides tokenizers for natural language.
- 2. Paste the full DESCRIPTION file inside a code block below.  

```
Package: tokenizers
Type: Package
Title: Tokenize Text
Version: 0.1.1
Date: 2016-04-03
Description: Convert natural language text into tokens. The tokenizers have a
    consistent interface and are compatible with Unicode, thanks to being built
    on the 'stringi' package. Includes tokenizers for shingled n-grams, skip
    n-grams, words, word stems, sentences, paragraphs, characters, lines, and
    regular expressions.
License: MIT + file LICENSE
LazyData: yes
Authors@R: c(person("Lincoln", "Mullen", role = c("aut", "cre"),
        email = "lincoln@lincolnmullen.com"),
        person("Dmitriy", "Selivanov", role = c("ctb"),
        email = "selivanov.dmitriy@gmail.com"))
URL: https://github.com/lmullen/tokenizers
BugReports: https://github.com/lmullen/tokenizers/issues
RoxygenNote: 5.0.1
Depends:
  R (>= 3.1.3)
Imports:
  stringi (>= 1.0.1),
  Rcpp (>= 0.12.3),
  SnowballC (>= 0.5.1)
LinkingTo: Rcpp
Suggests: testthat
```
- 3. URL for the package (the development repository, not a stylized html page)  

https://github.com/lmullen/tokenizers
- 4. What data source(s) does it work with (if applicable)?  

Natural language text
- 5. Who is the target audience?  

Users of R packages for NLP
- 6. Are there other R packages that accomplish the same thing? If so, what is different about yours?

Virtually every R package for NLP implements a couple of tokenizers. The point of this package is to collect all the tokenizers that one could conceivably want to use in a single package, and make sure that all the packages have a consistent interface. The package also aims to have fast and correct tokenizers implemented on top of stringi and Rcpp.
- 1. Check the box next to each policy below, confirming that you agree. These are mandatory.  
  - [x] This package does not violate the Terms of Service of any service it interacts with.  
  - [x] The repository has continuous integration with Travis CI and/or another service  
  - [ ] The package contains a vignette  
  - [x] The package contains a reasonably complete README with `devtools` install instructions  
  - [x] The package contains unit tests  
  - [x] The package only exports functions to the NAMESPACE that are intended for end users  
- 1. Do you agree to follow the [rOpenSci packaging guidelines](https://github.com/ropensci/packaging_guide)? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.  
  - [x] Are there any package dependencies not on CRAN?  
  - [x] Do you intend for this package to go on CRAN?  
  - [x] Does the package have a CRAN accepted license?  
  - [x] Did `devtools::check()` produce any errors or warnings? If so paste them below.  
- 1. Please add explanations below for any exceptions to the above:  

The package does not contain a vignette, but it does contain an extensive README. Since all the tokenizers work in the same basic way, a vignette seems unnecessary. But if rOpenSci wants one, I can easily adapt the README into a standalone vignette.

This package is already on CRAN.
- 10. If this is a resubmission following rejection, please explain the change in circumstances.  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tokenizers package #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tokenizers package #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions