Skip to content

Conversation

birnam
Copy link
Contributor

@birnam birnam commented Apr 22, 2025

In the past, the preview images for bookmarks from Reddit links were poorly chosen. Reddit does not use opengraph tags, so metascraper-images simply looked for all images on the page and returned the first. This tended to be the profile picture for the poster for the Reddit link.

This new plugin, using the existing metascraper framework, provides a better selection of image for the bookmark when the URL domain is 'reddit'.

In addition, recent changes (I believe this was a side effect of adding the metascraper-author and/or the metascaper-publisher plugins, but it could also be related to the metascraper-readibility plugin) broke what used to be a good choice of bookmark title. Previously, titles looked like 'Tinyauth just reached 1000 stars! : r/selfhosted' with both thread title and subreddit mentioned. After this update, all Reddit posts now have the same title: 'The heart of the internet'.

To return to the better format, this new metascraper-reddit plugin now attempts to retrieve the better title from reddit URLs. Note that in order to gain precedence in title selection, the 'metascraperReddit()' inclusion in the crawlerWorkers.ts metascraper instantiation list had to be moved above metascraperReadability().

birnam added 2 commits April 22, 2025 14:15
…ascraper is now redundant; also updating to latest versions of metascraper libraries
In the past, the preview images for bookmarks from Reddit links were
poorly chosen. Reddit does not use opengraph tags, so metascraper-images
simply looked for all images on the page and returned the first. This
tended to be the profile picture for the poster for the Reddit link.

This new plugin, using the existing metascraper framework, provides a
better selection of image for the bookmark when the URL domain is
'reddit'.

In addition, recent changes (I believe this was a side effect of adding
the metascraper-author and/or the metascaper-publisher plugins, but it
could also be related to the metascraper-readibility plugin) broke what
used to be a good choice of bookmark title. Previously, titles looked
like 'Tinyauth just reached 1000 stars! : r/selfhosted' with both thread
title and subreddit mentioned. After this update, all Reddit posts now
have the same title: 'The heart of the internet'.

To return to the better format, this new metascraper-reddit plugin now
attempts to retrieve the better title from reddit URLs. Note that in
order to gain precendence in title selection, the 'metascraperReddit()'
inclusion in the crawlerWorkers.ts metascraper instantiation list had to
be moved above metascraperReadability().
birnam added 3 commits May 29, 2025 09:42
@birnam
Copy link
Contributor Author

birnam commented May 29, 2025

My merge request into metascraper was accepted. I've updated the versions to use the latest which means I could remove the @ignore line that expected the typecheck error. I also updated a path to reflect the new directory structure.

@MohamedBassem MohamedBassem merged commit 7cc4b08 into karakeep-app:main Jun 22, 2025
5 checks passed
@MohamedBassem
Copy link
Collaborator

I tried it, and it works perfectly, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants