Skip to content

Add Flag to Avoid Treating NUL Separated Input as Binary #2974

@LangLangBart

Description

@LangLangBart

Discussed in #2971


Issue

Currently, running a command like the following will print a warning:

printf "First\0" | bat -p
SCR-20240529-trdo

The warning is defined in src/printer.rs:

bat/src/printer.rs

Lines 435 to 444 in 8f8c953

if !self.config.style_components.header() {
if Some(ContentType::BINARY) == self.content_type && !self.config.show_nonprintable {
writeln!(
handle,
"{}: Binary content from {} will not be printed to the terminal \
(but will be present if the output of 'bat' is piped). You can use 'bat -A' \
to show the binary file contents.",
Yellow.paint("[bat warning]"),
input.description.summary(),
)?;

The decision to label the input as BINARY seems to be made in src/input.rs:

bat/src/input.rs

Lines 260 to 271 in 8f8c953

let mut first_line = vec![];
reader.read_until(b'\n', &mut first_line).ok();
let content_type = if first_line.is_empty() {
None
} else {
Some(content_inspector::inspect(&first_line[..]))
};
if content_type == Some(ContentType::UTF_16LE) {
reader.read_until(0x00, &mut first_line).ok();
}

A hacky workaround is to make the first line empty, use bat, and then remove the first line:

printf "\nFirst\0" | bat -p | sed '1d'

Proposed solution

A new flag that doesn't label content_type as BINARY when the first line ends with a NUL byte:

# naming the flag '--text' to align with 'grep/git diff'
printf "First\0" | bat -p --text

The crate 1 used to determine if content is binary states:

//! encoding). Note that **this analysis can fail**. For example, even if unlikely, UTF-8-encoded
//! text can legally contain NULL bytes. Conversely, some particular binary formats (like binary

Based on this, a --text flag would be very appropriate, similar to how grep and git diff have one as well.

printf "First\0" | grep 'First'
# grep: (standard input): binary file matches

printf "First\0" | grep --text 'First'
# First

Footnotes

  1. sharkdp/content_inspector: Fast inspection of binary buffers to guess/determine the type of content

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions