Skip to content

Conversation

chocolatkey
Copy link
Member

Closes #15

New manifest command help output:

Generate a Readium Web Publication Manifest for a publication.

This command will parse a publication file (such as EPUB, PDF, audiobook, etc.)
and build a Readium Web Publication Manifest for it. The JSON manifest is
printed to stdout.

Examples:
  Print out a compact JSON RWPM. 
  $ readium manifest publication.epub

  Pretty-print a JSON RWPM using two-space indent.
  $ readium manifest --indent "  " publication.epub

  Extract the publication title with `jq`.
  $ readium manifest publication.epub | jq -r .metadata.title

Usage:
  readium manifest <pub-path> [flags]

Flags:
      --hash strings                                 Hashes to use when enhancing links, such as with image inspection. Note visual hashes are more computationally expensive. Acceptable values: sha256,md5,phash-dct,https://blurha.sh (default [sha256,md5])
  -h, --help                                         help for manifest
  -i, --indent string                                Indentation used to pretty-print
      --infer-a11y string                            Infer accessibility metadata: no, merged, split (default "no")
      --infer-a11y-ignore-image-dir string           Ignore the images in a given directory when inferring textual accessibility.
      --infer-a11y-ignore-image-dir-hashes strings   Hashes gathered for images in ignored images directory. (default [sha256,phash-dct])
      --infer-a11y-ignore-image-hashes strings       Ignore the given hashes when inferring textual accessibility. Hashes are in the format <algorithm>:<base64 value>, separated by commas.
      --infer-page-count                             Infer the number of pages from the generated position list.
      --inspect-images                               Inspect images in the manifest. Their links will be enhanced with size, width and height, and hashes

Something to note is that since the phash algorithm is being included by default for the ignorable image directory, there is a possibility of collision as the hash is not exact. This is illustrated in the test I created in the go-toolkit that compared these two images: https://github.com/readium/go-toolkit/blob/develop/pkg/analyzer/testdata/frame1.png vs. https://github.com/readium/go-toolkit/blob/develop/pkg/analyzer/testdata/frame2.png

@chocolatkey chocolatkey requested a review from HadrienGardeur May 8, 2025 07:24
@HadrienGardeur
Copy link
Member

Something to note is that since the phash algorithm is being included by default for the ignorable image directory, there is a possibility of collision as the hash is not exact.

I'm not too worried about this since it's more of a feature than a bug with perceptual hashing.

--infer-a11y-ignore-image-dir string Ignore the images in a given directory when inferring textual accessibility.

This is straightforward, nothing to say about it.

--infer-a11y-ignore-image-dir-hashes strings Hashes gathered for images in ignored images directory. (default [sha256,phash-dct])

I'm not sure if this one is clear enough. This is used to set which algorithms are used with --infer-a11y-ignore-image-dir right?

--infer-a11y-ignore-image-hashes strings Ignore the given hashes when inferring textual accessibility. Hashes are in the format :, separated by commas.

Nothing to say about this one either.

I'm interested in having @gautierchomel take on this as well.

@chocolatkey
Copy link
Member Author

I'm not sure if this one is clear enough. This is used to set which algorithms are used with --infer-a11y-ignore-image-dir right?

Yes

I'm not too worried about this since it's more of a feature than a bug with perceptual hashing.

OK, I just think it should be somehow made clear to users of the tool that this might cause mistakes in the evaluation

@HadrienGardeur
Copy link
Member

OK, I just think it should be somehow made clear to users of the tool that this might cause mistakes in the evaluation

We can cover that in the README, I've opened a separate branch for that.

@HadrienGardeur
Copy link
Member

--infer-a11y-ignore-image-dir-hashes strings Hashes gathered for images in ignored images directory. (default [sha256,phash-dct])

I would rename that one into --infer-a11y-ignore-image-dir-algorithms but I'm also wondering if we couldn't simply use --hash for that purpose as well.

Right now we have different algorithms used for --hash (MD5 and SHA256) and --infer-a11y-ignore-image-dir-hashes (SHA256 and pHash-DCT) but since we need to use the same algorithms to compare them, it doesn't make too much sense.

@HadrienGardeur
Copy link
Member

Based on our latest call with @chocolatkey:

  • we'll default to SHA256 only for both --inspect-images and --infer-a11y-ignore-image-dir
  • we'll use --hash as an optional flag for --infer-a11y-ignore-image-dir since there are no good reasons to have a separate flag
  • if you include hashes from other algorithms with --infer-a11y-ignore-image-hashes it automatically calculates those as well, there's no need to include them with --hash as well

I'll continue working on the README branch with these new decisions in mind.

@mickael-menu
Copy link
Member

  • if you include hashes from other algorithms with --infer-a11y-ignore-image-hashes it automatically calculates those as well, there's no need to include them with --hash as well

Are they then added to the Link object in the output manifest? I think this would be a weird side effect.

If I understand correctly, I feel like we're conflating the hashes that are used for processing and the ones that the user wants to output in the manifest for future consumption. Here are some ideas:

  • --hash or --link-hash provides the hashes that will be added to the manifest link objects.
  • We drop --infer-a11y-ignore-image-dir-hashes, instead we always use SHA256 when comparing the image-dir hashes or use whatever cryptographic hash already computed if we want to be compute-smart. But I don't think that's necessary. If there's a need to support perceptual hashes, we can do it by default or add an option --infer-a11y-ignore-image-dir-perceptual bool and use whatever perceptual algorithm we think is best in the toolkit.
  • The algorithms used in --infer-a11y-ignore-image-hashes are not automatically added to the link objects in the output, unless also added to --hash.

@HadrienGardeur
Copy link
Member

Are they then added to the Link object in the output manifest? I think this would be a weird side effect.

No, that would require using the --inspect-images flag.

But in order to compare (and potentially ignore) images, you need to compute these hashes. That's why I feel that a single flag (--hash) makes a lot more sense.

It becomes a way to indicate what should be computed across the board.

[…] instead we always use SHA256 when comparing the image-dir hashes or use whatever cryptographic hash already computed if we want to be compute-smart.

I don't think that's a good idea because some implementers will want to use a perceptual hash in addition to a cryptographic one.

[…] If there's a need to support perceptual hashes, we can do it by default or add an option --infer-a11y-ignore-image-dir-perceptual bool and use whatever perceptual algorithm we think is best in the toolkit.

We already have two different algorithms for cryptographic hashes and another two for perceptual hashes. The list might become longer over time and implementers who have stored hashes with one of these algorithms (let's say pHash-DCT) might want to keep using it.

Copy link
Member

@HadrienGardeur HadrienGardeur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be merged after implementing changes requested in #22 (comment)

@chocolatkey chocolatkey merged commit 43c0630 into develop May 21, 2025
4 checks passed
@chocolatkey chocolatkey deleted the ignorable-images-a11y branch May 21, 2025 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optional list of images to ignore when inferring nonvisual reading
3 participants