Skip to content

Conversation

carlopi
Copy link
Contributor

@carlopi carlopi commented Jun 5, 2025

Similar to #17807, this adds locale-independent handling for isspace.

I have to say I don't remember exactly if I manage to reproduce a problem with this or just looked right to have.

Feedback on the implementation welcome.

@carlopi carlopi marked this pull request as ready for review June 5, 2025 11:30
@carlopi carlopi changed the title local_agonstic::isspace to avoid spaces be depending on locale local_agnostic::isspace to avoid spaces be depending on locale Jun 5, 2025
@Mytherin
Copy link
Collaborator

Mytherin commented Jun 5, 2025

Thanks! Can we use StringUtil::CharacterIsSpace instead?

@Mytherin
Copy link
Collaborator

Mytherin commented Jun 5, 2025

I actually ran into an issue with std::isspace on Windows as well when testing there, so I think this is great to rip out.

@staticlibs
Copy link
Contributor

staticlibs commented Jun 5, 2025

Don't know how this function is used, so this comment may be off. Just perhaps it is better to check explicitly for the set of space characters defined in the parser (like scanner_isspace in Postgres). In case when the parser and C stdlib disagree what is space and what is not.

edit: sent this before seeing the comment above, apparently CharacterIsSpace is doing that.

@carlopi carlopi force-pushed the local_agnostic_isspace branch from 193ff2c to 4552235 Compare June 6, 2025 05:24
@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 6, 2025 05:25
@carlopi carlopi marked this pull request as ready for review June 6, 2025 05:26
@carlopi
Copy link
Contributor Author

carlopi commented Jun 6, 2025

Thanks for the comments, isspace + explicit classic locale, scanner_isspace or StringUtil::CharacterIsSpace all have the same logic for which characters to accept, I changed to StringUtil::CharacterIsSpace.

There are still a couple of mentions in the shell and a bunch in icu code, those might be also worth tackling in a follow up.

https://github.com/search?q=org%3Aduckdb%20isspace&type=code shows default isspace is used in other extensions, so that might also be worth a look.

@Mytherin Mytherin merged commit 357c63e into duckdb:v1.3-ossivalis Jun 6, 2025
52 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Jun 6, 2025

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Jun 6, 2025
local_agnostic::isspace to avoid spaces be depending on locale (duckdb/duckdb#17808)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Jun 6, 2025
local_agnostic::isspace to avoid spaces be depending on locale (duckdb/duckdb#17808)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
@carlopi carlopi deleted the local_agnostic_isspace branch August 17, 2025 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants