Skip to content

Conversation

pdet
Copy link
Contributor

@pdet pdet commented Feb 10, 2025

There are mainly two fixes in this PR:

  1. If buffer_size = 1, this could cause an error when calculating buffer sizes for decoding.
  2. If the buffer size is set, the maximum line size should be equal to or smaller than the buffer size (a line can never be larger than the buffer). I've adjusted the code to handle this automatically or throw errors if the user manually sets both options, creating conflicts. Since the maximum line size is now adjusted based on the buffer size, I also had to increase a few buffer size tests that would otherwise break because they were slightly too small (e.g., not accounting for newline characters like \r\n).

Fix: https://github.com/duckdblabs/duckdb-internal/issues/4172

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 10, 2025 17:11
@pdet pdet marked this pull request as ready for review February 10, 2025 17:13
@c-herrewijn
Copy link
Member

I took the liberty to rerun the fuzzer based on the PR, but it seems it still finds crashes / hangs.
Can you try the attached sqllogic tests?

crashes
fuzz_20250211_crash.test.zip
data

hangs
fuzz_20250211_hangs.test.zip
data

@pdet
Copy link
Contributor Author

pdet commented Feb 12, 2025

I took the liberty to rerun the fuzzer based on the PR, but it seems it still finds crashes / hangs. Can you try the attached sqllogic tests?

crashes fuzz_20250211_crash.test.zip data

hangs fuzz_20250211_hangs.test.zip data

Hey Chris, thanks for testing this again.

Indeed that were a couple more crashes from
crashes fuzz_20250211_crash.test.zip data.

The hangs are more complicated, we can't really early error on them because they have new line delimiters, so we end up scanning a lot. All of them finish if given enough time (About 6 minutes on release mode on my mac M1 max). One idea for the future is to also limit the sniffer in terms of Mb, but might be an overkill to be able to error faster on borked gzipped CSV files.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 12, 2025 12:03
@pdet pdet marked this pull request as ready for review February 12, 2025 12:37
@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 12, 2025 13:27
@pdet pdet marked this pull request as ready for review February 12, 2025 13:34
@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 12, 2025 20:35
@pdet pdet marked this pull request as ready for review February 12, 2025 20:51
@pdet pdet marked this pull request as draft February 13, 2025 10:51
@pdet pdet marked this pull request as ready for review February 13, 2025 10:52
@Mytherin
Copy link
Collaborator

Thanks for the PR!

Code changes looks good, two questions:

  • Do we need to check in all 100 files? Most files seem to target the same underlying issue/problem, no?
  • Should the fixes here target v1.2 instead of main (v1.3)?

@pdet pdet changed the base branch from main to v1.2-histrionicus February 17, 2025 18:14
@pdet pdet changed the base branch from v1.2-histrionicus to main February 17, 2025 18:15
Mytherin added a commit that referenced this pull request Feb 18, 2025
Basically cherry-picking the commits of
#16154 to the `v1.2` branch, and
reducing the number of tests to a minimal set.
@Mytherin
Copy link
Collaborator

Superseded by #16280

@Mytherin Mytherin closed this Feb 18, 2025
@pdet pdet deleted the afl_feb branch May 28, 2025 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants