Skip to content

Conversation

orionw
Copy link
Contributor

@orionw orionw commented Apr 17, 2024

Real PR for FollowIR (old draft for reference in #356).

Re-uses as much of the AbsTaskRetrieval abstract task as possible, thanks to previous feedback. Dataset task files are up to date with the newer MTEB metadata. Adds a test to check that things load and that the metric works.

I left the sample and n_characters metadata empty, I'm updating the dataset slightly (should be done in a few days) and will make a new PR when it's finished to update the dataset revisions and this metadata.

@orionw
Copy link
Contributor Author

orionw commented Apr 17, 2024

@KennethEnevoldsen and @Muennighoff, this is ready to be looked at when you get time. Thanks for your help!

Copy link
Contributor

@imenelydiaker imenelydiaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job @orionw! Can you please run the proposed tasks and add the results? You just need to run mteb on the tasks with intfloat/multilingual-e5-small and sentence_transformers/paraphrase-multilingual-MiniLM-L12-v2 and push the JSON files 🙂

@orionw
Copy link
Contributor Author

orionw commented Apr 17, 2024

Thanks @imenelydiaker! I added the small model (e5-small) results I ran as a verification check. I plan on adding a lot more (all the ones in my paper) but since I'm making some minor updates to the dataset I was gonna wait to re-run them all.

I also plan to make a PR to create the leaderboard to show these results and another PR to add rerankers to MTEB, so this is just the first of many :)

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates to the tests, generally the PR looks really good and the code is much cleaner! I did not discover any major issues, but have commented mostly on documentation.

@orionw
Copy link
Contributor Author

orionw commented Apr 18, 2024

Thanks @KennethEnevoldsen! Added the metadata and documentation 🙂

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. I think this is ready to merge after a cleanup.

Will you also add the points

@KennethEnevoldsen
Copy link
Contributor

@orionw, please note the changes to the scoring tracking system (#438) to avoid merge conflicts

@orionw orionw removed the request for review from Muennighoff April 19, 2024 12:52
@KennethEnevoldsen
Copy link
Contributor

Everything is looking good. I will merge this in! Very happy with this addition 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants