Skip to content

Conversation

estebank
Copy link
Contributor

@estebank estebank commented Jan 15, 2025

error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If that fails, we provide additional context about where the file has the first invalid UTF-8 character.

Fix #76869.

@rustbot
Copy link
Collaborator

rustbot commented Jan 15, 2025

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-compiletest Area: The compiletest test runner A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 15, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jan 15, 2025

Some changes occurred in src/tools/compiletest

cc @jieyouxu

@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added the A-tidy Area: The tidy tool label Jan 15, 2025
@rust-log-analyzer

This comment has been minimized.

@@ -101,6 +101,8 @@ pub fn load_errors(testfile: &Path, revision: Option<&str>) -> Vec<Error> {

rdr.lines()
.enumerate()
// We want to ignore utf-8 failures in tests during collection of annotations.
.filter(|(_, line)| line.is_ok())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This...

@@ -58,7 +58,7 @@ pub fn check(tests_path: impl AsRef<Path>, bad: &mut bool) {

let mut expected_revisions = BTreeSet::new();

let contents = std::fs::read_to_string(test).unwrap();
let Ok(contents) = std::fs::read_to_string(test) else { continue };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...this, and another line in md-doc blow up if a test has non-utf-8 bytes.

I ended up removing a test of ".rs file with invalid few utf-8 bytes" because md-docs is in a separate repo. I feel we should make the entire test suite more resilient to these, but I think I can make that test I added and removed be in run-make...

Comment on lines 77 to 81
let source_file = psess.source_map().load_file(path).unwrap_or_else(|e| {
let msg = format!("couldn't read {}: {}", path.display(), e);
let msg = format!("couldn't read `{}`: {}", path.display(), e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error here could be an enum of an io error or a utf8 error, and store the information about the span and message to be reported, and that would probably help with deduplicating these code

@@ -210,8 +210,34 @@ pub(crate) fn expand_include_str(
MacEager::expr(cx.expr_str(cx.with_def_site_ctxt(bsp), interned_src))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth looking into unifying the logic here and the rustc_parse logic somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the logic to a separate function and called it from both places.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 18, 2025
@rust-log-analyzer

This comment has been minimized.

@estebank estebank added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 18, 2025
Copy link
Member

@fee1-dead fee1-dead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me after squashing

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
estebank added a commit to estebank/rust that referenced this pull request Jan 22, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

CC rust-lang#76869, rust-lang#135557.
@estebank
Copy link
Contributor Author

@bors r=fee1-dead

@bors
Copy link
Collaborator

bors commented Jan 22, 2025

📌 Commit 57dd42d has been approved by fee1-dead

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 22, 2025
jieyouxu added a commit to jieyouxu/rust that referenced this pull request Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 22, 2025
Rollup of 8 pull requests

Successful merges:

 - rust-lang#135557 (Point at invalid utf-8 span on user's source code)
 - rust-lang#135596 (Properly note when query stack is being cut off)
 - rust-lang#135638 (Make it possible to build GCC on CI)
 - rust-lang#135648 (support wasm inline assembly in `naked_asm!`)
 - rust-lang#135826 (Misc. `rustc_resolve` cleanups)
 - rust-lang#135827 (CI: free disk with in-tree script instead of GitHub Action)
 - rust-lang#135850 (Update the `wasm-component-ld` tool)
 - rust-lang#135855 (Only assert the `Parser` size on specific arches)

r? `@ghost`
`@rustbot` modify labels: rollup
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 22, 2025
…iaskrgr

Rollup of 10 pull requests

Successful merges:

 - rust-lang#132983 (Edit dangling pointers )
 - rust-lang#133154 (Reword resolve errors caused by likely missing crate in dep tree)
 - rust-lang#135409 (Fix ICE-133117: multiple never-pattern arm doesn't have false_edge_start_block)
 - rust-lang#135557 (Point at invalid utf-8 span on user's source code)
 - rust-lang#135596 (Properly note when query stack is being cut off)
 - rust-lang#135794 (Detect missing fields with default values and suggest `..`)
 - rust-lang#135814 (ci: use ghcr buildkit image)
 - rust-lang#135826 (Misc. `rustc_resolve` cleanups)
 - rust-lang#135837 (Remove test panic from File::open)
 - rust-lang#135856 (Library: Finalize dyn compatibility renaming)

r? `@ghost`
`@rustbot` modify labels: rollup
@matthiaskrgr
Copy link
Member

@bors r-
#135891 (comment)

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 22, 2025
@matthiaskrgr
Copy link
Member

oops wrong pr
@bors r=fee1-dead

@bors
Copy link
Collaborator

bors commented Jan 22, 2025

📌 Commit 57dd42d has been approved by fee1-dead

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 22, 2025
@estebank
Copy link
Contributor Author

oops wrong pr

You had me scared for a second there "oh, no! what fresh hell did this utf-8 handling cause during rollup?" XD

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 22, 2025
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#132983 (Edit dangling pointers )
 - rust-lang#135409 (Fix ICE-133117: multiple never-pattern arm doesn't have false_edge_start_block)
 - rust-lang#135557 (Point at invalid utf-8 span on user's source code)
 - rust-lang#135596 (Properly note when query stack is being cut off)
 - rust-lang#135794 (Detect missing fields with default values and suggest `..`)
 - rust-lang#135814 (ci: use ghcr buildkit image)
 - rust-lang#135826 (Misc. `rustc_resolve` cleanups)
 - rust-lang#135837 (Remove test panic from File::open)
 - rust-lang#135856 (Library: Finalize dyn compatibility renaming)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit b4266b0 into rust-lang:master Jan 23, 2025
6 checks passed
@rustbot rustbot added this to the 1.86.0 milestone Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-compiletest Area: The compiletest test runner A-testsuite Area: The testsuite used to check the correctness of rustc A-tidy Area: The tidy tool S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ER] Line number for 'stream did not contain valid UTF-8' errors
7 participants