Skip to content

Conversation

carlopi
Copy link
Contributor

@carlopi carlopi commented Jun 5, 2025

Unsure on the implementation choice, I think adding a function on top is somewhat cleaner, but that might be me, feedback welcome.

Problem I had bumped in duckdb-wasm, where there would be some yet to be properly understood factor that would change the global static default locale to an unsupported one that would, via a SQL call to duckdb_tables end up calling a locale-influenced stringstream that would then thow.

I covered the places (together with a follow up PR on isspace) that were needed to have the standard set of tests pass even injecting a random locale from the one I had available via the following diff:

diff --git a/test/unittest.cpp b/test/unittest.cpp
index 1c04f9836b..a70847a2a6 100644
--- a/test/unittest.cpp
+++ b/test/unittest.cpp
@@ -30,6 +30,8 @@ int main(int argc, char *argv[]) {
        duckdb::unique_ptr<FileSystem> fs = FileSystem::CreateLocal();
        string test_directory = DUCKDB_ROOT_DIRECTORY;
 
+       std::locale::global(std::locale("ja_JP.SJIS"));
+
        int new_argc = 0;
        auto new_argv = duckdb::unique_ptr<char *[]>(new char *[argc]);
        for (int i = 0; i < argc; i++) {

TODO, but I think they could be submitted as follow ups, are increasing both testing and the surface moved to locale-independent functions.

@carlopi carlopi marked this pull request as ready for review June 5, 2025 11:28
@carlopi carlopi force-pushed the local_agnostic_streams branch from fb82064 to a5e4eff Compare June 5, 2025 11:31
@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 5, 2025 11:32
@carlopi carlopi marked this pull request as ready for review June 5, 2025 11:32
@carlopi carlopi force-pushed the local_agnostic_streams branch 2 times, most recently from 833b2e4 to 1e00aa0 Compare June 5, 2025 11:38
@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 5, 2025 11:39
@carlopi carlopi force-pushed the local_agnostic_streams branch from 1e00aa0 to 09736fd Compare June 5, 2025 22:58
@carlopi carlopi marked this pull request as ready for review June 5, 2025 22:58
@carlopi carlopi force-pushed the local_agnostic_streams branch from 09736fd to 3f7a785 Compare June 6, 2025 05:25
@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 6, 2025 05:25
@carlopi carlopi marked this pull request as ready for review June 6, 2025 05:26
@carlopi carlopi force-pushed the local_agnostic_streams branch from 3f7a785 to c109ac0 Compare June 6, 2025 05:39
@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 6, 2025 05:39
@carlopi carlopi marked this pull request as ready for review June 6, 2025 05:39
@carlopi
Copy link
Contributor Author

carlopi commented Jun 6, 2025

I think this should be good to be reviewed, name is a bit long maybe, unsure if there are preferred alternative implementations.

PR is not exaustive, there are still many potentially problematic usages in duckdb/duckdb or in external core extensions that would be nice to review and handle, plus adding a test mode.

In any case PR solves some of the problems, so should be already worth having in if implementation makes sense.

Mytherin added a commit that referenced this pull request Jun 6, 2025
Similar to #17807, this adds
locale-independent handling for isspace.

I have to say I don't remember exactly if I manage to reproduce a
problem with this or just looked right to have.

Feedback on the implementation welcome.
@Mytherin
Copy link
Collaborator

Mytherin commented Jun 6, 2025

Thanks for the PR! Instead of requiring people to use the GetLocaleAgnostic function - let's add duckdb::stringstream that "does the right thing" automatically.

@carlopi
Copy link
Contributor Author

carlopi commented Jun 6, 2025

Sure, I tried (and failed..) the easy way.

@carlopi carlopi marked this pull request as draft June 6, 2025 11:49
Mytherin added a commit that referenced this pull request Jun 12, 2025
Improved version of #17807, using
infrastructure from #17866 to check
`std::stringstream` are not used anymore at compile time.

Original problem I had bumped in duckdb-wasm, where there would be some
yet to be properly understood factor that would change the global static
default locale to an unsupported one that would, via a SQL call to
`duckdb_tables` end up calling a locale-influenced stringstream that
would then throw.

Testing can also be done on native applying the following diff, yet to
be added as testing mode
```diff
diff --git a/test/unittest.cpp b/test/unittest.cpp
index 1c04f98..a70847a2a6 100644
--- a/test/unittest.cpp
+++ b/test/unittest.cpp
@@ -30,6 +30,8 @@ int main(int argc, char *argv[]) {
        duckdb::unique_ptr<FileSystem> fs = FileSystem::CreateLocal();
        string test_directory = DUCKDB_ROOT_DIRECTORY;
 
+       std::locale::global(std::locale("ja_JP.SJIS"));
+
        int new_argc = 0;
        auto new_argv = duckdb::unique_ptr<char *[]>(new char *[argc]);
        for (int i = 0; i < argc; i++) {
```

Also cleans up comment from previous PR review at
#17866 (comment)

Note that other streams might (and will) have the same problems, so it
would be handy to expand this further, here taking it one step further.
@carlopi
Copy link
Contributor Author

carlopi commented Jun 12, 2025

Handled via #17894, closing.

@carlopi carlopi closed this Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants