Skip to content

Conversation

muja
Copy link
Owner

@muja muja commented Aug 28, 2024

This bug caused CJK chars in filename to be stripped out.

for example:

/media/mydisk/fooにほん.rar will result in /media/mydisk/foo so the open will fail

due to

    std::string AnsiArcName;
    if (r->ArcName!=nullptr)
    {
      AnsiArcName=r->ArcName;
      if (!AreFileApisANSI())
        IntToExt(r->ArcName,AnsiArcName);
    }

    std::wstring ArcName;
    if (r->ArcNameW!=nullptr && *r->ArcNameW!=0)
      ArcName=r->ArcNameW;
    else
      CharToWide(AnsiArcName,ArcName);

    Data->Cmd.AddArcName(ArcName);
    Data->Cmd.Overwrite=OVERWRITE_ALL;
    Data->Cmd.VersionControl=1;

    Data->Cmd.Callback=r->Callback;
    Data->Cmd.UserData=r->UserData;

    // Open shared mode is added by request of dll users, who need to
    // browse and unpack archives while downloading.
    Data->Cmd.OpenShared = true;
    if (!Data->Arc.Open(ArcName,FMF_OPENSHARED))
    {
      r->OpenResult=ERAR_EOPEN;
      delete Data;
      return NULL;
    }

in Data->Arc.Open it will do WideToChar which cause the CJK bytes got lost.

but if it calls CharToWide first (when ArcNameW is nullptr), when it does the reverse, it gets the right filename back.

@muja muja force-pushed the bug/unicode-filenames branch from e4199ca to 6392096 Compare August 28, 2024 18:25
@muja
Copy link
Owner Author

muja commented Aug 28, 2024

@ttys3 this is a copy of your PR #49 with an added test. Do you mind having a look at this regression? It seems the solution is not universal, not sure how much sense it makes to pursue this

@ttys3
Copy link
Contributor

ttys3 commented Aug 31, 2024

in this way, users of unrar.rs will get better DX and no need to call c method setlocale anymore.
and this issue #44 should been resolved too.

It took me hours to debugging this issue.

But I think the core reason is that the unrar code from rarlab, has bug while dealing with ArcNameW field.
but it does not have bug with ArcName field.

I think we are good to merge this.

@muja
Copy link
Owner Author

muja commented Aug 31, 2024

Are you able to see the GitHub checks?
Windows is failing checks:

     Running `D:\a\unrar.rs\unrar.rs\target\debug\deps\utf8-ad60e6114b4a8687.exe`

running 2 tests
test unicode_file ... FAILED
test unicode_list ... ok

failures:

---- unicode_file stdout ----
thread 'unicode_file' panicked at tests\utf8.rs:12:84:
called `Result::unwrap()` on an `Err` value: EOpen@Open (Could not open archive)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@muja muja force-pushed the bug/unicode-filenames branch 6 times, most recently from a9a1077 to 8f6450b Compare August 31, 2024 18:40
This bug caused CJK chars in filename to be stripped out on Linux.
For example, `/media/mydisk/fooにほん.rar`  would result in
`/media/mydisk/foo` so the open will fail

due to

```cpp
    if (r->ArcNameW!=nullptr && *r->ArcNameW!=0)
      ArcName=r->ArcNameW;
    else
      CharToWide(AnsiArcName,ArcName);

    Data->Arc.Open(ArcName,FMF_OPENSHARED)
```

in `Data->Arc.Open`, ArcName is converted with `WideToChar`
causing the CJK bytes to get lost. So we pass it as AnsiArcName,
so `CharToWide` is called first, causing to conversion to work.

Co-authored-by: Danyel <vcs@danyel.io>
Fixes #44
@muja muja force-pushed the bug/unicode-filenames branch from 8f6450b to ca47917 Compare August 31, 2024 18:41
@muja muja merged commit f3fb23d into master Aug 31, 2024
5 checks passed
@muja muja deleted the bug/unicode-filenames branch September 1, 2024 11:23
@ttys3
Copy link
Contributor

ttys3 commented Sep 1, 2024

@muja thanks for the second fixup.

Just checked the code:

bool File::Open(const std::wstring &Name,uint Mode)
{
ErrorType=FILE_SUCCESS;
FileHandle hNewFile;
bool OpenShared=File::OpenShared || (Mode & FMF_OPENSHARED)!=0;
bool UpdateMode=(Mode & FMF_UPDATE)!=0;
bool WriteMode=(Mode & FMF_WRITE)!=0;
#ifdef _WIN_ALL
uint Access=WriteMode ? GENERIC_WRITE:GENERIC_READ;
if (UpdateMode)
Access|=GENERIC_WRITE;
uint ShareMode=(Mode & FMF_OPENEXCLUSIVE) ? 0 : FILE_SHARE_READ;
if (OpenShared)
ShareMode|=FILE_SHARE_WRITE;
uint Flags=FILE_FLAG_SEQUENTIAL_SCAN;
FindData FD;
if (PreserveAtime)
Access|=FILE_WRITE_ATTRIBUTES; // Needed to preserve atime.
hNewFile=CreateFile(Name.c_str(),Access,ShareMode,NULL,OPEN_EXISTING,Flags,NULL);
DWORD LastError;
if (hNewFile==FILE_BAD_HANDLE)
{
LastError=GetLastError();
std::wstring LongName;
if (GetWinLongPath(Name,LongName))
{
hNewFile=CreateFile(LongName.c_str(),Access,ShareMode,NULL,OPEN_EXISTING,Flags,NULL);
// For archive names longer than 260 characters first CreateFile
// (without \\?\) fails and sets LastError to 3 (access denied).
// We need the correct "file not found" error code to decide
// if we create a new archive or quit with "cannot create" error.
// So we need to check the error code after \\?\ CreateFile again,
// otherwise we'll fail to create new archives with long names.
// But we cannot simply assign the new code to LastError,
// because it would break "..\arcname.rar" relative names processing.
// First CreateFile returns the correct "file not found" code for such
// names, but "\\?\" CreateFile returns ERROR_INVALID_NAME treating
// dots as a directory name. So we check only for "file not found"
// error here and for other errors use the first CreateFile result.
if (GetLastError()==ERROR_FILE_NOT_FOUND)
LastError=ERROR_FILE_NOT_FOUND;
}
}
if (hNewFile==FILE_BAD_HANDLE && LastError==ERROR_FILE_NOT_FOUND)
ErrorType=FILE_NOTFOUND;
if (PreserveAtime && hNewFile!=FILE_BAD_HANDLE)
{
FILETIME ft={0xffffffff,0xffffffff}; // This value prevents atime modification.
SetFileTime(hNewFile,NULL,&ft,NULL);
}
#else
int flags=UpdateMode ? O_RDWR:(WriteMode ? O_WRONLY:O_RDONLY);
#ifdef O_BINARY
flags|=O_BINARY;
#if defined(_AIX) && defined(_LARGE_FILE_API)
flags|=O_LARGEFILE;
#endif
#endif
// NDK r20 has O_NOATIME, but fails to create files with it in Android 7+.
#if defined(O_NOATIME)
if (PreserveAtime)
flags|=O_NOATIME;
#endif
std::string NameA;
WideToChar(Name,NameA);
int handle=open(NameA.c_str(),flags);
#ifdef LOCK_EX
#ifdef _OSF_SOURCE
extern "C" int flock(int, int);
#endif
if (!OpenShared && UpdateMode && handle>=0 && flock(handle,LOCK_EX|LOCK_NB)==-1)
{
close(handle);
return false;
}
#endif
if (handle==-1)
hNewFile=FILE_BAD_HANDLE;
else
{
#ifdef FILE_USE_OPEN
hNewFile=handle;
#else
hNewFile=fdopen(handle,UpdateMode ? UPDATEBINARY:READBINARY);
#endif
}
if (hNewFile==FILE_BAD_HANDLE && errno==ENOENT)
ErrorType=FILE_NOTFOUND;
#endif
NewFile=false;
HandleType=FILE_HANDLENORMAL;
SkipClose=false;
bool Success=hNewFile!=FILE_BAD_HANDLE;
if (Success)
{
hFile=hNewFile;
FileName=Name;
TruncatedAfterReadError=false;
}
return Success;
}

so, if ArcNameW is null and ArcName is not null, rarlab code will do WideToChar and and use it for the open operation.

but for Windows, it does not goes to the CharToWide logic due to macro def, it used the var directly

for Linux and macOS: it calls CharToWide first (when ArcNameW is nullptr), when it does the reverse, it gets the right filename back.

@ttys3
Copy link
Contributor

ttys3 commented Sep 1, 2024

so here is the thing:

for Windows, it is OK to set and use ArcNameW field only
for Linux, it is OK to set and use ArcName field only

for macOS, it seems that any one of the above logic works fine.

@muja
Copy link
Owner Author

muja commented Sep 1, 2024

Thanks for the investigation. I changed the implementation only for target_os = "linux" because the rest worked fine.

@ttys3
Copy link
Contributor

ttys3 commented Sep 1, 2024

just found anothing issue while I'm debugging the windows code in a real windows matchine.

the new rarlab code has a strange unicode in the comment, which cause msvc compile failed under some machines:

image

I'll create another PR

@ttys3
Copy link
Contributor

ttys3 commented Sep 1, 2024

just found anothing issue while I'm debugging the windows code in a real windows matchine.

PR created #57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants