as in paper. Should add same options as in predict config and then adapt from this script: https://github.com/yardencsGitHub/tweetynet/blob/master/article/src/scripts/run_eval_with_and_without_output_transforms.py