Skip to content

Conversation

ojwb
Copy link
Member

@ojwb ojwb commented Aug 17, 2024

Improve handling of zero bytes in string literals

Many of the target languages don't handle this currently. Those
that don't support this in their strings never will, but others
can probably be fixed (e.g. I've fixed this for SWIG/PHP so far).

Fixes #2996

@ojwb
Copy link
Member Author

ojwb commented Aug 17, 2024

This fixes:

  • the scanner now returns string literals as String* rather than const char*
  • string literal concatenation no longer truncates at zero bytes
  • PHP string literals are now created to avoid truncating at zero bytes

Zero bytes now work end to end (at least for #define CONST_STRING4 "zer\0zer\0") for C#, D, Go, Java and PHP. For C#, D and I think also Java they don't work unless emitted as target language literals (%csconst/%dmanifestconst/%javaconst) because the mechanism to pass them from C/C++ to the target language treats them as const char*.

D testing isn't hooked up in the branch as currently pushed.

Go isn't automatically tested but the generated Go wrapper looks good: const CONST_STRING4 string = "zer\x00zer\x00"

They don't currently work for Guile, Lua, mzscheme, Ocaml, Octave, Perl, Python, Ruby, Scilab, Tcl.

R doesn't seem to wrap these constants at all so it's hard to tell if they would be mishandled.

I don't think I'm going to attempt to fix all the target languages - if I fix the generic machinery and establish which target languages work (and ensure there's test coverage for those which do) then the target language maintainers (or users who care) can fix target languages which can handle strings with zero bytes.

I think we ought to have tests for such strings in other situations though.

@ojwb
Copy link
Member Author

ojwb commented Aug 18, 2024

The fix for PHP breaks other cases (which I missed locally despite testing - I guess the C++11 tests didn't get run):

cpp11_decltype_wrap.cxx: In function ‘zend_result zm_startup_cpp11_decltype(int, int)’:
cpp11_decltype_wrap.cxx:3426:40: error: initializer fails to determine size of ‘swig_string’
 3426 |   static const char swig_string[] = B::should_be_string;
      |                                     ~~~^~~~~~~~~~~~~~~~
cpp11_decltype_wrap.cxx:3426:40: error: array must be initialized with a brace-enclosed initializer

For PHP we need a way to emit different code for the case where the constant value is a string literal. One current quirk of SWIG doesn't help here - string literals should really have type const char[] but SWIG seems to treat them as const char* - if that was fixed we could have a different consttab typemap for the char* and char[] cases. Maybe fixing that is the way to go, but I think I might just revert the PHP fix and leave it as a language where this isn't currently supported.

ojwb added 4 commits August 29, 2024 09:33
Many of the target languages don't handle this currently.  Those
that don't support this in their strings never will, but others
can probably be fixed (e.g. I've fixed this for SWIG/PHP so far).

Fixes swig#2996
@ojwb ojwb force-pushed the string-literals-with-zero-bytes branch from 422856b to 3f2dfd8 Compare August 28, 2024 21:37
@ojwb
Copy link
Member Author

ojwb commented Aug 28, 2024

Rebased onto current master and added a draft CHANGES.current entry. Also:

D testing isn't hooked up in the branch as currently pushed.

Done.

I think I might just revert the PHP fix and leave it as a language where this isn't currently supported.

Done.

@ojwb
Copy link
Member Author

ojwb commented Aug 29, 2024

Python test failures with MSVC don't seem connected to my changes, but the last CI build of git master didn't trigger these:

li_std_vector_enum_wrap.cxx(6666): error C2275: 'std::vector<EnumVector::numbers,std::allocator<EnumVector::numbers>>::value_type': expected an expression instead of a type
li_std_vector_enum_wrap.cxx(6666): note: prefix the qualified-id with 'typename' to indicate a type
li_std_vector_enum_wrap.cxx(6666): error C2923: 'swig::traits': 'std::vector<EnumVector::numbers,std::allocator<EnumVector::numbers>>::value_type' is not a valid template type argument for parameter 'Type'
li_std_vector_enum_wrap.cxx(6666): note: see declaration of 'std::vector<EnumVector::numbers,std::allocator<EnumVector::numbers>>::value_type'
li_std_vector_enum_wrap.cxx(6666): error C2955: 'swig::traits': use of class template requires template argument list
li_std_vector_enum_wrap.cxx(3915): note: see declaration of 'swig::traits'
li_std_vector_enum_wrap.cxx(6666): warning C4346: 'swig::traits::category': dependent name is not a type
li_std_vector_enum_wrap.cxx(6666): note: prefix the qualified-id with 'typename' to indicate a type
li_std_vector_enum_wrap.cxx(6666): error C2923: 'swig::container_owner': 'swig::traits::category' is not a valid template type argument for parameter 'T'
li_std_vector_enum_wrap.cxx(6666): note: see declaration of 'swig::traits::category'
li_std_vector_enum_wrap.cxx(6666): error C2955: 'swig::container_owner': use of class template requires template argument list
li_std_vector_enum_wrap.cxx(5011): note: see declaration of 'swig::container_owner'

@ojwb
Copy link
Member Author

ojwb commented Aug 29, 2024

I think we ought to have tests for such strings in other situations though.

The other relevant situation is default arguments, but actually there a string literal with a zero byte gets truncated in C++ anyway so there isn't anything we can usefully test, consider:

std::string f(std::string s = "zer\0zer\0") { return s; }

f() returns std::string("zer").

@ojwb
Copy link
Member Author

ojwb commented Aug 29, 2024

The CI failure seems to be due to an MSVC change so I'm going to ignore that and merge this.

@ojwb ojwb closed this in 61e54ab Aug 29, 2024
@ojwb ojwb deleted the string-literals-with-zero-bytes branch August 29, 2024 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort out handling of string literals with embedded zero bytes
1 participant