Skip to content

Encoded sequences can clash with separator byte and cause assertion errors #85

@danielnaber

Description

@danielnaber

This causes the assert to fail in TrimSuffixEncoder.java:60:

Dictionary d = Dictionary.read(Paths.get("/tmp/org/languagetool/resource/de/german_synth.dict"));
DictionaryLookup dict = new DictionaryLookup(d);
dict.lookup("anfragen|VER:1:PLU:KJ1:SFT");  // works
dict.lookup("anfragen|DOESNOTEXIST");  // works
dict.lookup("anfragen|VER:1:PLU:KJ1:SFT:NEB");  // AssertionError at TrimSuffixEncoder.java:60

To reproduce, you can get the german_synth.dict from http://search.maven.org/remotecontent?filepath=de/danielnaber/german-pos-dict/1.0/german-pos-dict-1.0.jar

Stacktrace is this:

java.lang.AssertionError
	at morfologik.stemming.TrimSuffixEncoder.decode(TrimSuffixEncoder.java:60)
	at morfologik.stemming.DictionaryLookup.lookup(DictionaryLookup.java:217)
	at org.languagetool.CrashTest.test(CrashTest.java:18)

In a less stripped down case this shows as (at least I assume this is the same issue):

Caused by: java.lang.IndexOutOfBoundsException
	at java.nio.Buffer.checkIndex(Buffer.java:540)
	at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
	at morfologik.stemming.TrimSuffixEncoder.decode(TrimSuffixEncoder.java:62)
	at morfologik.stemming.DictionaryLookup.lookup(DictionaryLookup.java:217)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions