-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Modernize use of UTF-8 in Windows code #32380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32380. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
3b2da12
to
942cea7
Compare
Rebased on #32383. |
942cea7
to
9ee993c
Compare
After this we should revert our custom leveldb unicode patch. This would make the Edit: After this, i could find the following uses of Windows -W WCHAR functions remaining:
|
49b1cca
to
48f1e8e
Compare
Please review #32396 first. |
48f1e8e
to
7bc30de
Compare
Rebased on #32396. |
Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use] |
7bc30de
to
6fccd25
Compare
…or Windows 8f4fed7 symbol-check: Add check for application manifest in Windows binaries (Hennadii Stepanov) 2bb6ab8 ci: Add "Get bitcoind manifest" steps to Windows CI jobs (Hennadii Stepanov) 282b491 cmake: Add application manifests when cross-compiling for Windows (Hennadii Stepanov) Pull request description: Windows [application manifests ](https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests) provide several benefits—such as enhanced security settings, and the ability to set a process-wide code page (required for #32380), as well as granular control over supported Windows versions. Most of these benefits lie beyond the scope of this PR and will be evaluated separately. On the current master branch @ fc6346d, the linker generates and embeds a manifest only when building with MSVC: ```xml <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"> <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3"> <security> <requestedPrivileges> <requestedExecutionLevel level="asInvoker" uiAccess="false"></requestedExecutionLevel> </requestedPrivileges> </security> </trustInfo> </assembly> ``` However, this manifest fails validation: ``` > mt.exe -nologo -inputresource:build\bin\Release\bitcoind.exe -validate_manifest mt.exe : general error 10100ba: The manifest is missing the definition identity. ``` This PR unifies manifest embedding for both native and cross-compilation builds. Here is the change in the manifest on Windows: ```diff --- bitcoind-master.manifest +++ bitcoind-pr.manifest @@ -1,5 +1,6 @@ <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"> + <assemblyIdentity type="win32" name="org.bitcoincore.bitcoind" version="29.99.0.0"></assemblyIdentity> <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3"> <security> <requestedPrivileges> ``` which effectively resolves the "missing the definition identity" error. Finally, “Get bitcoind manifest” steps have been added to the Windows CI jobs to ensure the manifest is embedded and validated. ACKs for top commit: sipsorcery: re-tACK 8f4fed7. hodlinator: re-ACK 8f4fed7 davidgumberg: Reviewed and tested ACK 8f4fed7 Tree-SHA512: 6e2dbdc77083eafdc242410eb89a6678e37b11efd786505dcd7844f0bac8f44d68625e62924a03b26549bdb4aaec5330dc608e6b4d66789f0255092e23aef6cb
bdfe5cc
to
35fbe50
Compare
35fbe50
to
95f86f3
Compare
|
95f86f3
to
9337cb5
Compare
9337cb5
to
b158b72
Compare
Bitcoin Core is supported and tested on operating systems using the | ||
Linux Kernel 3.17+, macOS 13+, and Windows 10+. Bitcoin | ||
Bitcoin Core is supported and tested on the following operating systems or newer: | ||
Linux Kernel 3.17, macOS 13, and Windows 10 (version 1903). Bitcoin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is anything enforcing this, or is the assumption that any Windows 10 system is at least this version? What happens if it isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is anything enforcing this, or is the assumption that any Windows 10 system is at least this version?
The latter. My attempt to enforce it was based on a wrong assumption.
What happens if it isn't?
My guess is that the application won't properly handle any symbols outside its default code page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe check the codepage early at startup (e.g. SetupEnvironment()
) and fail if it isn't correct? This seems more thorough than any version check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe check the codepage early at startup (e.g.
SetupEnvironment()
) and fail if it isn't correct? This seems more thorough than any version check.
Thanks! Done.
This version is the minimum required to support the UTF-8 code page (CP_UTF8).
Additionally, this change adds app manifests to targets that were previously missing them.
This change removes one use case of `std::wstring_convert`, which is deprecated in C++17 and removed in C++26. Other uses remain for now.
b158b72
to
e3719ef
Compare
e3719ef
to
bd43cf8
Compare
Changes overlapping with #32566 have been dropped. |
<assemblyIdentity | ||
type="win32" | ||
name="org.bitcoincore.${target}" | ||
version="${CLIENT_VERSION_MAJOR}.${CLIENT_VERSION_MINOR}.${CLIENT_VERSION_BUILD}.0" | ||
/> | ||
<asmv3:application> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 1e826bf: Can the commit be updated to better explain why this is needed. Looking at mingw-w64, GetACP
has returned CP_UTF8
since it was introduced. I'm guessing the same is happening with MSVC (although that's less relevant given it doesn't effect releases)? Given that, and the fact that according to https://github.com/bitcoin/bitcoin/pull/32380/files#r2096044476, we are just going to assume that a user is running a new enough version of Windows, and throw an assertion if the codepage isn't CP_UTF8
, what is the app-manifest actually doing? Given the version setting from #32537 also didn't work, it's not clear why any of this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not how it's supposed to be. The default application codepage is something that the user can configure system-wide (there should be a checkbox: https://exploratory.io/note/exploratory/Enabling-UTF-8-on-Windows-hYc3yWL0).
The purpose of the application manifest is to override it so it is always UTF-8 for the application. Exactly what the microsoft documentation says here: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
This hasn't changed with newer windows versions, as far as i know, it's just that there is no usable UTF-8 codepage in earlier ones, so it cannot be set there.
I'm guessing the same is happening with MSVC (although that's less relevant given it doesn't effect releases)?
It's absolutely relevant to MSVC-build versions, even if it doesn't affect the release. If the code assumes the file path locale is UTF-8 and it isn't all kinds of ugly filename breakage will happen.
If GetACP
always returns CP_UTF8
then i agree the check is meaningless, and we should find a meaningful check for "path codepage is UTF-8".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 1e826bf: Can the commit be updated to better explain why this is needed. Looking at mingw-w64,
GetACP
has returnedCP_UTF8
since it was introduced. I'm guessing the same is happening with MSVC (although that's less relevant given it doesn't effect releases)?
GetACP
being compiled on Windows returns 1252, which is not the UTF-8 code page. And, basically, it would depend on the OS language settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at mingw-w64,
GetACP
has returnedCP_UTF8
since it was introduced.
I see it differently. I've cross-compiled this code:
#include <iostream>
#include <windows.h>
int main()
{
std::cout << GetACP() << "\n";
}
and it prints 1252 on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw: GetACP
is a windows system function implemented in kernel32.dll
, defined in the winnls.h
header. There shouldn't be a mingw-specific definition.
CP_UTF8
is also defined in that header and has the following value:
#define CP_UTF8 65001
1252
is codepage windows-1252, a legacy encoding.
The main goal is to remove deprecated code (removed in C++26).
This PR employs Microsoft's modern approach to handling UTF-8:
TODO:
std:wstring_convert
.subprocess.h
will be addressed in a follow-up PR, as additional tests are likely needed.common/system.cpp
is handled in Use subprocess library for notifications #32566.