Skip to content

integrated tokenizer #47

@attardi

Description

@attardi

A simple change is needed in order to integrate a tokenizer.
In file utils/transform.py, to method CoNLL.transform.init(), add the optional parameter

reader=open

and then set

self.reader=reader

and in CoNLL.load(), change it to use it:

    if isinstance(data, str):
        if not hasattr(self, 'reader'): self.reader = open # back compatibility       
        with self.reader(data) as f:
            lines = [line.strip() for line in f]

You can then pass as reader a nltk tokenizer or a Stanza tokenizer.
I use this code to interface tp Stanza:

tokenizer.py.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions