Skip to content

[v2] Reupload datasets with trust_remote_code #1877

@Samoed

Description

@Samoed

These datasets in old not parquet format. If we reupload them it would be faster to with them

exceptions = [
"BornholmBitextMining",
"BibleNLPBitextMining",
"DiaBlaBitextMining",
"FloresBitextMining",
"IN22ConvBitextMining",
"NTREXBitextMining",
"IN22GenBitextMining",
"IndicGenBenchFloresBitextMining",
"IWSLT2017BitextMining",
"SRNCorpusBitextMining",
"VieMedEVBitextMining",
"HotelReviewSentimentClassification",
"TweetEmotionClassification",
"DanishPoliticalCommentsClassification",
"TenKGnadClassification",
"ArxivClassification",
"FinancialPhrasebankClassification",
"FrenkEnClassification",
"PatentClassification",
"PoemSentimentClassification",
"TweetTopicSingleClassification",
"YahooAnswersTopicsClassification",
"FilipinoHateSpeechClassification",
"HebrewSentimentAnalysis",
"HindiDiscourseClassification",
"FrenkHrClassification",
"Itacola",
"JavaneseIMDBClassification",
"WRIMEClassification",
"KorHateClassification",
"KorSarcasmClassification",
"AfriSentiClassification",
"AmazonCounterfactualClassification",
"AmazonReviewsClassification",
"MTOPDomainClassification",
"MTOPIntentClassification",
"NaijaSenti",
"NordicLangClassification",
"NusaX-senti",
"SwissJudgementClassification",
"MyanmarNews",
"DutchBookReviewSentimentClassification",
"NorwegianParliamentClassification",
"PAC",
"HateSpeechPortugueseClassification",
"Moroco",
"RomanianReviewsSentiment",
"RomanianSentimentClassification",
"GeoreviewClassification",
"FrenkSlClassification",
"DalajClassification",
"SwedishSentimentClassification",
"WisesightSentimentClassification",
"UrduRomanSentimentClassification",
"VieStudentFeedbackClassification",
"IndicReviewsClusteringP2P",
"MasakhaNEWSClusteringP2P",
"MasakhaNEWSClusteringS2S",
"MLSUMClusteringP2P.v2",
"CodeSearchNetRetrieval",
"DanFEVER",
"GerDaLIR",
"GermanDPR",
"AlphaNLI",
"ARCChallenge",
"FaithDial",
"HagridRetrieval",
"HellaSwag",
"PIQA",
"Quail",
"RARbCode",
"RARbMath",
"SIQA",
"SpartQA",
"TempReasonL1",
"TempReasonL2Context",
"TempReasonL2Fact",
"TempReasonL2Pure",
"TempReasonL3Context",
"TempReasonL3Fact",
"TempReasonL3Pure",
"TopiOCQA",
"WinoGrande",
"AlloprofRetrieval",
"BSARDRetrieval",
"JaGovFaqsRetrieval",
"JaQuADRetrieval",
"NLPJournalAbsIntroRetrieval",
"NLPJournalTitleAbsRetrieval",
"NLPJournalTitleIntroRetrieval",
"IndicQARetrieval",
"MintakaRetrieval",
"MIRACLRetrieval",
"MLQARetrieval",
"MultiLongDocRetrieval",
"NeuCLIR2022Retrieval",
"NeuCLIR2023Retrieval",
"XMarket",
"XPQARetrieval",
"ArguAna-PL",
"DBPedia-PL",
"FiQA-PL",
"HotpotQA-PL",
"MSMARCO-PL",
"NFCorpus-PL",
"NQ-PL",
"Quora-PL",
"SCIDOCS-PL",
"SciFact-PL",
"TRECCOVID-PL",
"SpanishPassageRetrievalS2P",
"SpanishPassageRetrievalS2S",
"SwednRetrieval",
"SweFaqRetrieval",
"KorHateSpeechMLClassification",
"BrazilianToxicTweetsClassification",
"CTKFactsNLI",
"LegalBenchPC",
"indonli",
"OpusparcusPC",
"PawsX",
"XStance",
"MIRACLReranking",
"FinParaSTS",
"JSICK",
"JSTS",
"RonSTS",
"STSES",
"AlloProfClusteringP2P.v2",
"AlloProfClusteringS2S.v2",
"LivedoorNewsClustering",
"MewsC16JaClustering",
"MLSUMClusteringS2S.v2",
"SwednClusteringP2P",
"SwednClusteringS2S",
"IndicXnliPairClassification",
]

Metadata

Metadata

Assignees

No one assigned

    Labels

    v2Issues and PRs related to `v2` branch

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions