You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MINGW32_NT-6.1-7601 PC 3.3.6-341.x86_64 2022-11-20 15:12 UTC x86_64 Msys
Compiler
GCC 12.2.0
Virtualization / Containers
No response
CPU
Intel Core i7 Q720
Current Behavior
If tesseact is built without the legacy engine (--disable-legacy), recognizing vertical Japanese text with jpn_vert (from tessdata_fast) and PSM_AUTO (--psm 3) gives garbage. Here is an example image:
Here is the output of tesseract 1.png stdout -l jpn_vert --psm 3:
…4
/
\
09$2pY
コ べ
ほり メー14
Here is the correct (albeit with some OCR errors) output that I get either without --disable-legacy, or when using PSM_SINGLE_BLOCK_VERT_TEXT (--psm 5) explicitly regardless of whether the legacy engine is enabled:
ケー タイ は
カバ パ バン に
入れ て ある し
Expected Behavior
No response
Suggested Fix
If this is a kind of feature rather than a bug, it should probably be documented somewhere.