Skip to content

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Apr 29, 2025

The main goal is to remove deprecated code (removed in C++26).

This PR employs Microsoft's modern approach to handling UTF-8:

Until recently, Windows has emphasized "Unicode" -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, then -A APIs typically operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes.

TODO:

  • Handle application manifests properly when building with MSVC.
  • Bump the minimum supported Windows version to 1903 (May 2019 Update).
  • Remove all remaining use cases of the deprecated std:wstring_convert.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 29, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32380.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #33247 (build: Remove deprecated CMAKE_SKIP_BUILD_RPATH and SKIP_BUILD_RPATH settings by 151henry151)
  • #32951 (build: Explicitly set Qt's AUTO{MOC,RCC,UIC} property per target by hebasto)
  • #31723 (qa debug: Add --debug_runs/-waitfordebugger [DRAFT] by hodlinator)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@hebasto
Copy link
Member Author

hebasto commented Apr 30, 2025

Rebased on #32383.

@laanwj
Copy link
Member

laanwj commented May 1, 2025

After this we should revert our custom leveldb unicode patch. This would make the env_windows backend go back to using the A functions, which then take UTF-8 as-is.

Edit: After this, i could find the following uses of Windows -W WCHAR functions remaining:

  • LevelDB (see above)
  • subprocess: CreateProcessW (upstream issue, i guess)
  • src/random.cpp: CryptAcquireContextW - not actually passing strings, so no problem. Mind this function is deprecated for out-of-scope reasons: Use of deprecated CryptAcquireContext in random.cpp #32391
  • src/util/fs.cpp: CreateFileW - converts using file.wstring().c_str()
  • src/util/fs_helpers.cpp: SHGetSpecialFolderPathW - converts using fs::path(WCHAR*)
  • src/qt/guiutil.cpp: GetModuleFileNameW - result passed directly into ISHellLinkW interface, might be better to leave this code (SetStartOnSystemStartup) as-is
  • src/qt/guiutil.cpp: PathRemoveFileSpecW - idem

@hebasto
Copy link
Member Author

hebasto commented May 1, 2025

Please review #32396 first.

@hebasto hebasto force-pushed the 250429-windows-utf8 branch from 48f1e8e to 7bc30de Compare May 1, 2025 15:35
@hebasto
Copy link
Member Author

hebasto commented May 1, 2025

Rebased on #32396.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 5, 2025

Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

File commit eba5f9c
(master)
commit d039c6a
(pull/32380/merge)
*-aarch64-linux-gnu-debug.tar.gz dd18c4706ae0a247... 86637bd208268bb1...
*-aarch64-linux-gnu.tar.gz 4216933d26250c15... e5bbf8a01575dffd...
*-arm-linux-gnueabihf-debug.tar.gz f972e0bd2981ac67... 8afd5f526dc8f4cf...
*-arm-linux-gnueabihf.tar.gz 3ca06111b5a591e8... d4c0fd8d5ce3ba86...
*-arm64-apple-darwin-codesigning.tar.gz 8a9c416ffe2e3d36... 0697f6515271ae7d...
*-arm64-apple-darwin-unsigned.tar.gz 216c1a1f68356c92... 9bca33ae11db549e...
*-arm64-apple-darwin-unsigned.zip 1f4d27d16e6c3eea... 4a745d1d4ebdf725...
*-powerpc64-linux-gnu-debug.tar.gz 8900b88ab21f9f48... 1eb495cd89d88ecd...
*-powerpc64-linux-gnu.tar.gz c7ed6137d21d3e52... 92b05ec45b154612...
*-riscv64-linux-gnu-debug.tar.gz aad7744d23e2a136... 6f5ce4a909e4ea0c...
*-riscv64-linux-gnu.tar.gz d49be6063aa6a124... 0af7c1a58a6167f9...
*-x86_64-apple-darwin-codesigning.tar.gz c8973dacfc9e519c... 8f70bddec78726b2...
*-x86_64-apple-darwin-unsigned.tar.gz 9ca6afb860f0b246... 6525d1b7cde740ba...
*-x86_64-apple-darwin-unsigned.zip a364bd58765172cc... 174395eb35a55891...
*-x86_64-linux-gnu-debug.tar.gz c11030172812a7c1... 3f1a15b4c2c184f4...
*-x86_64-linux-gnu.tar.gz f288901ecf194ab8... aa602ac4c259b746...
*.tar.gz 82d1cd8e8b352d6b... 26375d232dfe0d5d...
SHA256SUMS.part a0c3bbbfdd25fd25... fffa601b9ae5c99f...
guix_build.log d4d91f9e2f775124... 21db7fa459cab888...
guix_build.log.diff 2dadd820439d138b...

@hebasto hebasto force-pushed the 250429-windows-utf8 branch from 7bc30de to 6fccd25 Compare May 13, 2025 10:28
fanquake added a commit that referenced this pull request May 16, 2025
…or Windows

8f4fed7 symbol-check: Add check for application manifest in Windows binaries (Hennadii Stepanov)
2bb6ab8 ci: Add "Get bitcoind manifest" steps to Windows CI jobs (Hennadii Stepanov)
282b491 cmake: Add application manifests when cross-compiling for Windows (Hennadii Stepanov)

Pull request description:

  Windows [application manifests ](https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests) provide several benefits—such as enhanced security settings, and the ability to set a process-wide code page (required for #32380), as well as granular control over supported Windows versions. Most of these benefits lie beyond the scope of this PR and will be evaluated separately.

  On the current master branch @ fc6346d, the linker generates and embeds a manifest only when building with MSVC:
  ```xml
  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
    <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
      <security>
        <requestedPrivileges>
          <requestedExecutionLevel level="asInvoker" uiAccess="false"></requestedExecutionLevel>
        </requestedPrivileges>
      </security>
    </trustInfo>
  </assembly>
  ```

  However, this manifest fails validation:
  ```
  > mt.exe -nologo -inputresource:build\bin\Release\bitcoind.exe -validate_manifest

  mt.exe : general error 10100ba: The manifest is missing the definition identity.
  ```

  This PR unifies manifest embedding for both native and cross-compilation builds.

  Here is the change in the manifest on Windows:
  ```diff
  --- bitcoind-master.manifest
  +++ bitcoind-pr.manifest
  @@ -1,5 +1,6 @@
   <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
   <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  +  <assemblyIdentity type="win32" name="org.bitcoincore.bitcoind" version="29.99.0.0"></assemblyIdentity>
     <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
       <security>
         <requestedPrivileges>
  ```

  which effectively resolves the "missing the definition identity" error.

  Finally, “Get bitcoind manifest” steps have been added to the Windows CI jobs to ensure the manifest is embedded and validated.

ACKs for top commit:
  sipsorcery:
    re-tACK 8f4fed7.
  hodlinator:
    re-ACK 8f4fed7
  davidgumberg:
    Reviewed and tested ACK 8f4fed7

Tree-SHA512: 6e2dbdc77083eafdc242410eb89a6678e37b11efd786505dcd7844f0bac8f44d68625e62924a03b26549bdb4aaec5330dc608e6b4d66789f0255092e23aef6cb
@hebasto hebasto force-pushed the 250429-windows-utf8 branch 2 times, most recently from bdfe5cc to 35fbe50 Compare May 16, 2025 11:27
@hebasto hebasto changed the title [PoC] Modernize use of UTF-8 in Windows code Modernize use of UTF-8 in Windows code May 16, 2025
@hebasto hebasto force-pushed the 250429-windows-utf8 branch from 35fbe50 to 95f86f3 Compare May 16, 2025 14:21
@hebasto
Copy link
Member Author

hebasto commented May 16, 2025

Rebased on #32537.

@hebasto hebasto force-pushed the 250429-windows-utf8 branch from 95f86f3 to 9337cb5 Compare May 18, 2025 13:42
@hebasto hebasto force-pushed the 250429-windows-utf8 branch from 9337cb5 to b158b72 Compare May 19, 2025 15:12
@hebasto hebasto marked this pull request as ready for review May 19, 2025 15:13
Bitcoin Core is supported and tested on operating systems using the
Linux Kernel 3.17+, macOS 13+, and Windows 10+. Bitcoin
Bitcoin Core is supported and tested on the following operating systems or newer:
Linux Kernel 3.17, macOS 13, and Windows 10 (version 1903). Bitcoin
Copy link
Member

@fanquake fanquake May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is anything enforcing this, or is the assumption that any Windows 10 system is at least this version? What happens if it isn't?

Copy link
Member Author

@hebasto hebasto May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is anything enforcing this, or is the assumption that any Windows 10 system is at least this version?

The latter. My attempt to enforce it was based on a wrong assumption.

What happens if it isn't?

My guess is that the application won't properly handle any symbols outside its default code page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe check the codepage early at startup (e.g. SetupEnvironment()) and fail if it isn't correct? This seems more thorough than any version check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe check the codepage early at startup (e.g. SetupEnvironment()) and fail if it isn't correct? This seems more thorough than any version check.

Thanks! Done.

hebasto added 4 commits May 20, 2025 12:29
This version is the minimum required to support the UTF-8 code page
(CP_UTF8).
Additionally, this change adds app manifests to targets that were
previously missing them.
This change removes one use case of `std::wstring_convert`, which is
deprecated in C++17 and removed in C++26. Other uses remain for now.
@hebasto hebasto force-pushed the 250429-windows-utf8 branch from b158b72 to e3719ef Compare May 20, 2025 11:41
@hebasto hebasto force-pushed the 250429-windows-utf8 branch from e3719ef to bd43cf8 Compare May 20, 2025 14:01
@hebasto
Copy link
Member Author

hebasto commented May 20, 2025

Changes overlapping with #32566 have been dropped.

<assemblyIdentity
type="win32"
name="org.bitcoincore.${target}"
version="${CLIENT_VERSION_MAJOR}.${CLIENT_VERSION_MINOR}.${CLIENT_VERSION_BUILD}.0"
/>
<asmv3:application>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 1e826bf: Can the commit be updated to better explain why this is needed. Looking at mingw-w64, GetACP has returned CP_UTF8 since it was introduced. I'm guessing the same is happening with MSVC (although that's less relevant given it doesn't effect releases)? Given that, and the fact that according to https://github.com/bitcoin/bitcoin/pull/32380/files#r2096044476, we are just going to assume that a user is running a new enough version of Windows, and throw an assertion if the codepage isn't CP_UTF8, what is the app-manifest actually doing? Given the version setting from #32537 also didn't work, it's not clear why any of this is needed.

Copy link
Member

@laanwj laanwj May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how it's supposed to be. The default application codepage is something that the user can configure system-wide (there should be a checkbox: https://exploratory.io/note/exploratory/Enabling-UTF-8-on-Windows-hYc3yWL0).

The purpose of the application manifest is to override it so it is always UTF-8 for the application. Exactly what the microsoft documentation says here: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

This hasn't changed with newer windows versions, as far as i know, it's just that there is no usable UTF-8 codepage in earlier ones, so it cannot be set there.

I'm guessing the same is happening with MSVC (although that's less relevant given it doesn't effect releases)?

It's absolutely relevant to MSVC-build versions, even if it doesn't affect the release. If the code assumes the file path locale is UTF-8 and it isn't all kinds of ugly filename breakage will happen.

If GetACP always returns CP_UTF8 then i agree the check is meaningless, and we should find a meaningful check for "path codepage is UTF-8".

Copy link
Member Author

@hebasto hebasto May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 1e826bf: Can the commit be updated to better explain why this is needed. Looking at mingw-w64, GetACP has returned CP_UTF8 since it was introduced. I'm guessing the same is happening with MSVC (although that's less relevant given it doesn't effect releases)?

GetACP being compiled on Windows returns 1252, which is not the UTF-8 code page. And, basically, it would depend on the OS language settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at mingw-w64, GetACP has returned CP_UTF8 since it was introduced.

I see it differently. I've cross-compiled this code:

#include <iostream>
#include <windows.h>

int main()
{
    std::cout << GetACP() << "\n";
}

and it prints 1252 on Windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw: GetACP is a windows system function implemented in kernel32.dll, defined in the winnls.h header. There shouldn't be a mingw-specific definition.

CP_UTF8 is also defined in that header and has the following value:

#define CP_UTF8 65001

1252 is codepage windows-1252, a legacy encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants