Skip to content

Conversation

jondegenhardt
Copy link
Contributor

This PR changes csv2tsv to have separate command line arguments for the TAB and Newline replacement strings. Prior to this, csv2tsv used the same replacement string for both. The replacement strings default to a single space as before.

The previous command line argument, --r|replacement has bee replaced by a pair of arguments:

  • --r|tab-replacement - Replacement string for TSV field delimiters, normally TABs, found in the CSV data.
  • --n|newline-replacement - Replacement string for newlines (record delimiters) found in the CSV data.

This change provides better ability to preserve the original CSV data when the need occurs. For example, there are several Unicode representations for TAB and Newline that can be used. It may also be desirable to replace TABs with spaces, but use a Unicode Newline representation for newlines in the data. Some relevant Unicode characters:

  • U+2028 - Line Separator
  • U+2029 - Paragraph Separator
  • U+2424 () - Visual symbol for Newline
  • U+2409 () - Visual symbol for Horizontal TAB.

None of these characters are used as field or record terminators in TSV and can be used safely. The choice to use a these characters or any others as replacements can only be made in the context of the task being performed. This PR better enables these choices.

@codecov-commenter
Copy link

Codecov Report

Merging #303 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #303   +/-   ##
=======================================
  Coverage   99.36%   99.36%           
=======================================
  Files          18       18           
  Lines        6941     6943    +2     
=======================================
+ Hits         6897     6899    +2     
  Misses         44       44           
Impacted Files Coverage Δ
csv2tsv/src/tsv_utils/csv2tsv.d 100.00% <100.00%> (ø)

@jondegenhardt jondegenhardt merged commit c439745 into eBay:master Sep 8, 2020
@jondegenhardt jondegenhardt deleted the csv2tsv-newline-replacement branch September 8, 2020 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants