Skip to content

Conversation

ryanofsky
Copy link
Contributor

Add a developer note about avoiding fs::PathToString in RPCs, and improve some other fs::PathToString comments.

Developer note might have been useful in two recent review comments:

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9b575f1

@maflcko
Copy link
Member

maflcko commented Nov 16, 2021

Is there any way to enforce this without looking at the code with human eyes? Are the tests going to fail if we switch to std filesystem and have an incorrect string conversion?

@laanwj
Copy link
Member

laanwj commented Nov 16, 2021

Documentation review ACK 9b575f1

Are the tests going to fail if we switch to std filesystem and have an incorrect string conversion?

Yes, especially the Windows tests will fail, as the situation is most dire there. Both the functional and unit tests, as they both have some tests that use unicode characters in paths (that are not in any of the "narrow" codepages).

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9b575f1

Thanks for improving documentation on that topic!

Copy link
Contributor

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9b575f1

This PR intends to add documentation for the PathToString functions.
This PR:

  • Clarifies where it is safe to use this command and where it used to be replaced with other functions.
  • Rewords the comments in the fs.h file detailing pathToString usage to make them more logically and verbally sound.
  • Move the Windows and POSIX implementation notes under the PathToString function

The documentation is nicely worded and easy to understand. There is no critical grammar error in the wording. However, I found some minor grammatical corrections that you might consider if you update the PR for some other crucial reason.

@jamesob
Copy link
Contributor

jamesob commented Nov 16, 2021

ACK 9b575f1

@fanquake fanquake merged commit b869a78 into bitcoin:master Nov 17, 2021
@ryanofsky
Copy link
Contributor Author

re: #23522 (comment)

Is there any way to enforce this without looking at the code with human eyes? Are the tests going to fail if we switch to std filesystem and have an incorrect string conversion?

I think ideally PathToString and PathFromString functions would be eliminated. I wrote these functions to allow switching from boost::filesystem to std::filesystem while preserving current behavior, but current behavior is actually a little broken.

PathToString is mostly used for logging, but this is not ideal because the result is usually not quoted, and paths can contain spaces, quotes, punctuation and text that make resulting log messages ambiguous. They could also contain arbitrary characters including newlines, invalid UTF-8 sequences, invalid multibyte sequences that break log parsing. I think ideally these PathToString calls would be replaced with calls to a new PathToLogString function, and that function would add surrounding quotes around paths, replace internal quotes with \", replace special ASCII characters like tabs and newlines with C \t \n escapes, replace non-latin unicode characters with \uXXXX escapes on windows, and replace non-latin bytes with \xXX escapes on POSIX.

PathFromString is mostly used to parse command line and configuration arguments. I think ideally these calls would be removed and replaced with calls to a new ArgsManager::GetPathArg method. The method would treat command line arguments and configuration file paths as bytes on POSIX with no encoding or decoding. On windows, it would decode configuration file paths as UTF-8, and it would either decode command line arguments according to the current code page, or it would bypass the code page encoding and decoding done by the windows C library by overriding wmain and getting the original unicode arguments passed by the parent process in wchar_t *argv[]

So ideally we could get rid of PathToString to PathFromString entirely. But more narrowly if we just wanted to try minimize new encoding bugs and avoid PathToString results being passed to the UniValue constructor, we would need to use distinct string types like std::string for byte strings and std::u8string for unicode strings and make it an error to construct a UniValue from a byte string. We could also try to renaming the PathToString function to discourage use. Maybe to something like PathToByteString, PathToInternalString, PathToSystemString, LegacyPathToString

@ryanofsky
Copy link
Contributor Author

Thanks for all reviews and thanks @shaavan for suggested edits. I didn't get to apply them here, but they could be used in another followup or maybe #20744

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Nov 17, 2021
9b575f1 Improve fs::PathToString documentation (Russell Yanofsky)

Pull request description:

  Add a developer note about avoiding `fs::PathToString` in RPCs, and improve some other `fs::PathToString` comments.

  Developer note might have been useful in two recent review comments:

  - bitcoin#23398 (comment)
  - bitcoin#23155 (comment)

ACKs for top commit:
  laanwj:
    Documentation review ACK 9b575f1
  jamesob:
    ACK bitcoin@9b575f1
  prayank23:
    ACK bitcoin@9b575f1
  hebasto:
    ACK 9b575f1
  shaavan:
    ACK 9b575f1

Tree-SHA512: b8b3ecb6208c3897241e4f24dcec64fe7cf091bc79388862cf5f4b315cb8e804939981c4bed4c81dbff99ec9f750bad99015d0f04890704ac9df63c2a6719b6d
@bitcoin bitcoin locked and limited conversation to collaborators Jan 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants