Skip to content

[BUG] Retire toponym corpus from Catalan Common Voice #4337

@c-armentano

Description

@c-armentano

Describe the bug
These sentences are far too repetitive:
https://github.com/common-voice/common-voice/blob/main/server/data/ca/frases_agenda.txt.

We created them to obtain a corpus with all the toponyms of the Catalan-speaking area, but we weren't aware that they would be recorded more than once. Some volunteers complained they were repetitive, and they may lead to a phonetically unbalanced corpus.

To Reproduce
N/A

Expected behavior
We would like to prevent them to reappear to be recorded.

Screenshots
N/A

Desktop or Mobile (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional Hardware (were you using headphones, an external speaker or an external microphone?):

  • Type:
  • Model:

Additional context
N/A

Metadata

Metadata

Assignees

Labels

BugText CorpusBugs or feature requests that are related to Text Corpus

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions