Skip to content

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Dec 14, 2022

This PR fixes #5670

As Mytherin noted, the issue itself is fixed in #5691, so this PR only adds a test for the issue mentioned.

@Tishj
Copy link
Contributor Author

Tishj commented Dec 14, 2022

I'll look to see if this still fails after Mytherins PR is merged

@cookiejest
Copy link

Hey! Thanks for the above work everyone. Is this one gunna be fixed in the next release then?

@Mause
Copy link
Member

Mause commented Jan 3, 2023

Hey! Thanks for the above work everyone. Is this one gunna be fixed in the next release then?

The fix itself is already included in the latest dev builds

@chrisbrain
Copy link
Contributor

I have just tried running the test on:
#5670

With the latest dev build, which I assume is what is installed via:
npm install duckdb@next

And it still fails in the same way. Are we sure this issue has been addressed?

Could we get this PR merged to confirm it really is fixed on master branch @Mytherin / @Mause?

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow the CI never ran here - will try to rerun it

@chrisbrain
Copy link
Contributor

@Mytherin - Thanks, if I am reading it correctly though the action failed on this test:
image

@Tishj
Copy link
Contributor Author

Tishj commented Jan 12, 2023

Yea it seems this wasn't fixed by the other PR and the issue still exists

@Tishj
Copy link
Contributor Author

Tishj commented Feb 9, 2023

So the issue seems to be that we use the c_str() of a temporary string, which goes out of scope and the memory becomes random.

		case duckdb::LogicalTypeId::BLOB:
		case duckdb::LogicalTypeId::VARCHAR: {
			auto data = ret.Get("data").As<Napi::Array>();
			auto out = duckdb::FlatVector::GetData<duckdb::string_t>(*jsargs->result);
			for (size_t i = 0; i < data.Length(); ++i) {
				out[i] = duckdb::string_t(data.Get(i).ToString());
			}
			break;
		}

This ends up in this constructor

	string_t(const string &value)
	    : string_t(value.c_str(), value.size()) { // NOLINT: Allow implicit conversion from `const char*`
	}

And for non-inlined strings we just copy the pointer

			// large string: store pointer
#ifndef DUCKDB_DEBUG_NO_INLINE
			memcpy(value.pointer.prefix, data, PREFIX_LENGTH);
#else
			memset(value.pointer.prefix, 0, PREFIX_BYTES);
#endif
			value.pointer.ptr = (char *)data;
		}

Tishj added 4 commits February 9, 2023 09:55
…rary string that goes out of scope, this memory becomes random. This is what we have a StringHeap for, using AddString we store non-inlined strings in there, so the memory doesn't go out of scope
@Tishj
Copy link
Contributor Author

Tishj commented Feb 9, 2023

This change should fix the issue 👍

StringVector's have a StringHeap exactly for this purpose, we need to make sure the memory of these non-inlined strings stays alive long enough, so we copy over the memory into this heap and the data of the vector contains a string_t that points into this heap.

@Mytherin
Copy link
Collaborator

Mytherin commented Feb 9, 2023

Thanks! Looks good to me.

@Mytherin Mytherin merged commit 9a82af3 into duckdb:master Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'Invalid unicode (byte sequence mismatch) detected in value construction' for JS UDF returning more than 12 characters
5 participants