Regression in extracting text from Excel TIF image


### Environment

[ExcelTest_Bug.zip](https://github.com/tesseract-ocr/tesseract/files/10669551/ExcelTest_Bug.zip)
[ExcelTest_text_TesseractV4.txt](https://github.com/tesseract-ocr/tesseract/files/10669573/ExcelTest_text_TesseractV4.txt)

Tesseract Version: 5.2.0 vs. 4.1.1.-rc2-37-gcla5
Ubuntu 20.04.3 LTS

### Current Behavior:

With the attached TIF image of an Excel file (in the zip), Tesseract version 5.2.0 extracts a minimal amount of text (only a single line "hiding rows 15 through 20").  However, in prior versions of Tesseract, namely the version 4.1.1 version noted above, but likely other versions as well, the amount of text extracted from the same TIF image is significantly larger (multiple lines of text, approximately 1K of text over multiple pages).  Attached a separate text file with the output of the V4.x text output.

### Expected Behavior:

Expecting version 5.2+ of Tesseract to at least replicate the behavior of prior versions in extracting text from this sample TIF.

### Suggested Fix:

Correct the text extraction to match the output from previous Tesseract versions.  Concerned with Tesseract's regression in ability to extract text from Excel files.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regression in extracting text from Excel TIF image #4014

Environment

Current Behavior:

Expected Behavior:

Suggested Fix:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Regression in extracting text from Excel TIF image #4014

Description

Environment

Current Behavior:

Expected Behavior:

Suggested Fix:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions