Skip to content

Releases: danieldk/jitar

Tagging of CoNLL-X files and bugfixes

01 Oct 09:03
Compare
Choose a tag to compare

Changes in this release:

  • Fix a bug where the start/end markers could be used when handling unknown tokens (typically an unseen punctuation character). This change does not require retraining.
  • Add a utility jitar-tag-conllx to tag files that are in the CoNLL-X format. This preserves all other columns.
  • Compute interpolated scores only once.

It's Christmas!

31 Jul 07:14
Compare
Choose a tag to compare

Changes compared to Jitar 0.1.0:

  • Add a capitalization marking to tags (as per the TnT paper). This gives and improvement of around .2% on German and English.
  • Add a separate unknown word distribution for words containing a dash. This provides a modest improvement for English and German.
  • API simplification (no more need to use/specify start and end markers).
  • Java-style corpus readers.
  • Unified training and tagging data structures.
  • Add a utility for 10-fold cross-validation.

The changes break existing models, so you should retrain your model when switching to Jitar 0.3.0.

Jitar 0.1.0

03 Oct 16:12
Compare
Choose a tag to compare
jitar-0.1.0

[maven-release-plugin]  copy for tag jitar-0.1.0