-
-
Notifications
You must be signed in to change notification settings - Fork 829
Description
Describe the Bug
I've just cloned Main to test out updating to v0.23 for the Proxmox LXC version. I'm not sure if this is a bug, or perhaps I'm impatient, but when I add a PDF, the OCR job runs, but the screenshot gen job does not. The PDFs look like this:
Then I go into Admin settings and trigger a reprocess and the screenshot is generated:
Steps to Reproduce
- Install/update Hoarder to latest, based on Main
- Add a PDF
- Wait and refresh the page
- Trigger a manual reprocess job then see the image generated
Expected Behaviour
Unless I'm misunderstanding how it's supposed to work, I was thinking that upon adding a PDF, multiple jobs would be running; at least one for the OCR and another for the screenshot gen.
Screenshots or Additional Context
I'm not using the Docker version, but a Proxmox LXC install using the script we created.
I've broken the log output into sections, but there is nothing left out, it's just to note when certain events occur as a result of my actions.
The new dependencies are installed:
root@hoarder-v023:~# dpkg -l | grep ghostscript
ii ghostscript 10.0.0~dfsg-11+deb12u6 amd64 interpreter for the PostScript language and for PDF
root@hoarder-v023:~# dpkg -l | grep graphicsmagick
ii graphicsmagick 1.4+really1.3.40-4 amd64 collection of image processing tools
ii libgraphicsmagick-q16-3 1.4+really1.3.40-4 amd64 format-independent image processing - C shared library
Adding a PDF:
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.350Z info: [Crawler] Connecting to existing browser instance: http://127.0.0.1:9222
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.350Z info: [Crawler] Successfully resolved IP address, new address: http://127.0.0.1:9222/
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.399Z info: Starting crawler worker ...
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.399Z info: Starting inference worker ...
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.399Z info: Starting search indexing worker ...
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.399Z info: Starting tidy assets worker ...
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.399Z info: Starting video worker ...
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.400Z info: Starting feed worker ...
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.400Z info: Starting asset preprocessing worker ...
Mar 06 19:06:20 hoarder-v023 pnpm[12413]: 2025-03-07T00:06:20.400Z info: Starting webhook worker ...
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.583Z info: [Crawler][69] Will crawl "https://getsamplefiles.com/download/pdf/sample-1.pdf" for link with id "df6uiumrmz6mluy1bq9r26zf"
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.584Z info: [Crawler][69] Attempting to determine the content-type for the url https://getsamplefiles.com/download/pdf/sample-1.pdf
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.627Z info: [webhook][71] Starting a webhook job for bookmark with id "df6uiumrmz6mluy1bq9r26zf"
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.627Z info: [webhook][71] Completed successfully
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.634Z info: [search][70] Attempting to index bookmark with id df6uiumrmz6mluy1bq9r26zf ...
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.704Z info: [search][70] Completed successfully
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.710Z info: [Crawler][69] Content-type for the url https://getsamplefiles.com/download/pdf/sample-1.pdf is "application/pdf"
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.710Z info: [Crawler][69] Downloading pdf from "https://getsamplefiles.com/download/pdf/sample-1.pdf"
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.731Z info: [Crawler][69] Downloaded pdf as assetId: 3ff23b35-95c1-444c-b6a9-73146ce01a44
Mar 06 19:08:03 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:03.742Z info: [Crawler][69] Completed successfully
Mar 06 19:08:04 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:04.639Z info: [assetPreprocessing][72] Starting an asset preprocessing job for bookmark with id "df6uiumrmz6mluy1bq9r26zf"
Mar 06 19:08:04 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:04.642Z info: [assetPreprocessing][72] Attempting to extract text from pdf.
Mar 06 19:08:04 hoarder-v023 pnpm[12413]: Warning: Setting up fake worker.
Mar 06 19:08:04 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:04.840Z info: [assetPreprocessing][72] Extracted 2212 characters from pdf.
Mar 06 19:08:04 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:04.850Z info: [assetPreprocessing][72] Completed successfully
Mar 06 19:08:05 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:05.629Z debug: [inference][73] No inference client configured, nothing to do now
Mar 06 19:08:05 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:05.629Z info: [inference][73] Completed successfully
Mar 06 19:08:05 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:05.789Z info: [search][74] Attempting to index bookmark with id df6uiumrmz6mluy1bq9r26zf ...
Mar 06 19:08:05 hoarder-v023 pnpm[12413]: 2025-03-07T00:08:05.924Z info: [search][74] Completed successfully
Manually triggering a reprocessing job from the Admin console:
Mar 06 19:09:06 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:06.948Z info: [assetPreprocessing][75] Starting an asset preprocessing job for bookmark with id "df6uiumrmz6mluy1bq9r26zf"
Mar 06 19:09:06 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:06.952Z info: [assetPreprocessing][75] Skipping PDF text extraction as it's already been extracted.
Mar 06 19:09:06 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:06.952Z info: [assetPreprocessing][75] Attempting to generate PDF screenshot for bookmarkId: df6uiumrmz6mluy1bq9r26zf
Mar 06 19:09:07 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:07.293Z info: [assetPreprocessing][75] Successfully saved PDF screenshot to database
Mar 06 19:09:07 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:07.295Z info: [assetPreprocessing][75] Completed successfully
Mar 06 19:09:07 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:07.298Z info: [assetPreprocessing][76] Starting an asset preprocessing job for bookmark with id "jw045iecs73tcp72cs90xtwz"
Mar 06 19:09:07 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:07.299Z info: [assetPreprocessing][76] Skipping PDF text extraction as it's already been extracted.
Mar 06 19:09:07 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:07.299Z info: [assetPreprocessing][76] Skipping PDF screenshot generation as it's already been generated.
Mar 06 19:09:07 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:07.299Z info: [assetPreprocessing][76] Completed successfully
Mar 06 19:09:07 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:07.727Z debug: [inference][77] No inference client configured, nothing to do now
Mar 06 19:09:07 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:07.728Z info: [inference][77] Completed successfully
Mar 06 19:09:08 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:08.032Z info: [search][78] Attempting to index bookmark with id df6uiumrmz6mluy1bq9r26zf ...
Mar 06 19:09:08 hoarder-v023 pnpm[12413]: 2025-03-07T00:09:08.106Z info: [search][78] Completed successfully
Device Details
Firefox latest Arch Linux
Exact Hoarder Version
Pulled from Main
Have you checked the troubleshooting guide?
- I have checked the troubleshooting guide and I haven't found a solution to my problem