Skip to content

Backfill task metadata for HISTORIC DATASETS #2502

@isaac-chung

Description

@isaac-chung

The following TaskMetadata still has None for a few parameters. The effort is to fill in those parameters and remove the task name from tests/test_TaskMetadata.py's _HISTORIC_DATASETS variable.

Instructions:

  1. Check which dataset is still incomplete from this list.
  2. Comment below which datasets you'd like to take. Please only take 1-2 at a time.
  3. Fork the repo, checkout a feature branch, make the changes (i.e. fill in the missing metadata, remove task name from _HISTORIC_DATASETS)
  4. OnceRun pytest tests/test_TaskMetadata.py to check if your changes are complete.
  5. Make a PR in the repo. In the description, add "Part of Backfill task metadata for HISTORIC DATASETS #2502"
  • PolEmo2.0-IN
  • PolEmo2.0-OUT
  • AllegroReviews Backfill task metadata for metadata for BigPatentClustering and AllegroReviews #2755
  • PAC
  • TNews
  • IFlyTek
  • MultilingualSentiment
  • JDReview
  • OnlineShopping
  • Waimai
  • BlurbsClusteringP2P
  • BlurbsClusteringS2S
  • TenKGnadClusteringP2P
  • TenKGnadClusteringS2S
  • ArxivClusteringP2P
  • ArxivClusteringS2S
  • BigPatentClustering Backfill task metadata for metadata for BigPatentClustering and AllegroReviews #2755
  • RedditClustering
  • RedditClusteringP2P
  • StackExchangeClustering
  • StackExchangeClusteringP2P
  • TwentyNewsgroupsClustering
  • WikiCitiesClustering
  • AlloProfClusteringP2P
  • AlloProfClusteringS2S
  • HALClusteringS2S
  • MLSUMClusteringP2P
  • MLSUMClusteringS2S
  • MasakhaNEWSClusteringP2P
  • MasakhaNEWSClusteringS2S
  • EightTagsClustering
  • RomaniBibleClustering
  • SpanishNewsClusteringP2P
  • SwednClustering
  • CLSClusteringS2S
  • CLSClusteringP2P
  • ThuNewsClusteringS2S
  • ThuNewsClusteringP2P
  • TV2Nordretrieval
  • TwitterHjerneRetrieval
  • GerDaLIR
  • GerDaLIRSmall
  • GermanDPR Backfill task metadata for metadata for GermanDPR and GermanQuAD #2566
  • GermanQuAD-Retrieval Backfill task metadata for metadata for GermanDPR and GermanQuAD #2566
  • LegalQuAD
  • AILACasedocs
  • AILAStatutes
  • ArguAna
  • ClimateFEVER
  • CQADupstackRetrieval
  • CQADupstackAndroidRetrieval
  • CQADupstackEnglishRetrieval
  • CQADupstackGamingRetrieval
  • CQADupstackGisRetrieval
  • CQADupstackMathematicaRetrieval
  • CQADupstackPhysicsRetrieval
  • CQADupstackProgrammersRetrieval
  • CQADupstackStatsRetrieval
  • CQADupstackTexRetrieval
  • CQADupstackUnixRetrieval
  • CQADupstackWebmastersRetrieval
  • CQADupstackWordpressRetrieval
  • DBPedia
  • FEVER
  • FiQA2018
  • HagridRetrieval
  • LegalBenchConsumerContractsQA
  • LegalBenchCorporateLobbying
  • LegalSummarization
  • LEMBNeedleRetrieval
  • LEMBPasskeyRetrieval
  • MSMARCO
  • MSMARCOv2
  • NarrativeQARetrieval
  • NFCorpus
  • NQ
  • QuoraRetrieval
  • SCIDOCS
  • SciFact
  • Touche2020
  • TRECCOVID
  • AlloprofRetrieval
  • BSARDRetrieval
  • SyntecRetrieval
  • JaQuADRetrieval
  • Ko-miracl
  • Ko-StrategyQA
  • MintakaRetrieval
  • MIRACLRetrieval
  • MultiLongDocRetrieval
  • XMarket
  • SNLRetrieval
  • ArguAna-PL
  • DBPedia-PL
  • FiQA-PL
  • HotpotQA-PL
  • MSMARCO-PL
  • NFCorpus-PL
  • NQ-PL
  • Quora-PL
  • SCIDOCS-PL
  • SciFact-PL
  • TRECCOVID-PL
  • SpanishPassageRetrievalS2P
  • SpanishPassageRetrievalS2S
  • SweFaqRetrieval
  • T2Retrieval
  • MMarcoRetrieval
  • DuRetrieval
  • CovidRetrieval
  • CmedqaRetrieval
  • EcomRetrieval
  • MedicalRetrieval
  • VideoRetrieval
  • LeCaRDv2
  • SprintDuplicateQuestions
  • TwitterSemEval2015
  • TwitterURLCorpus
  • OpusparcusPC
  • PawsX
  • SICK-E-PL
  • PpcPC
  • CDSC-E
  • PSC
  • Ocnli
  • Cmnli
  • AskUbuntuDupQuestions
  • MindSmallReranking
  • SciDocsRR
  • StackOverflowDupQuestions
  • AlloprofReranking
  • SyntecReranking
  • T2Reranking
  • MMarcoReranking
  • CMedQAv1-reranking
  • CMedQAv2-reranking
  • GermanSTSBenchmark
  • BIOSSES
  • SICK-R
  • STS12
  • STS13
  • STS14
  • STS15
  • STS16
  • STSBenchmark
  • FinParaSTS
  • SICKFr
  • KLUE-STS
  • KorSTS
  • STS17
  • STS22
  • STSBenchmarkMultilingualSTS
  • SICK-R-PL
  • CDSC-R
  • RonSTS
  • STSES
  • ATEC
  • BQ
  • LCQMC
  • PAWSX
  • STSB
  • AFQMC
  • QBQTC
  • SummEval
  • SummEvalFr
  • MalayalamNewsClassification
  • TamilNewsClassification
  • TenKGnadClusteringP2P.v2
  • TenKGnadClusteringS2S.v2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions