Skip to content

Conversation

HarikalarKutusu
Copy link
Contributor

There are cases where metadata .tsv files get malformed because of uncleaned free-form text. This included sentences, sentence sources, sentences in reported and reasons where people use "other" option and write free form.

The problems were reported and analyzed in these topics:

The "sentence" in *_sentences.tsv were previously fixed here.

This PR extents this to:

  • Source fields in *_sentences.tsv files
  • Sentence and reason fields in reported.tsv

Note: This has NOT been tested on real data, but should be OK as the modifications were simple. Should be tested before the next release, if merged.

@HarikalarKutusu HarikalarKutusu requested a review from a team as a code owner March 27, 2025 22:32
@HarikalarKutusu HarikalarKutusu requested review from moz-dfeller and removed request for a team March 27, 2025 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants