Skip to content

Bookmarks/links that archive PDF content are not searchable with url:<substring> #874

@ahgraber

Description

@ahgraber

Describe the Bug

I'm using Hoarder to archive whitepapers from arxiv.org. When I bookmark a paper (https://arxiv.org/pdf/1706.03762), the "asset" object that is created uses 'sourceUrl' instead of the "link" object's 'url'. This means that when I search for bookmarks with url:arxiv.org, none of the whitepapers are found.

Steps to Reproduce

  1. Create a pdf bookmark (https://arxiv.org/pdf/1706.03762)
  2. Search for url:arxiv.org

Expected Behaviour

sourceUrl is also used in the search

Screenshots or Additional Context

The first object is a pdf, the second is a link

[
  {
    "id": "erie3afnpzcymf1i9oh9bauj",
    "createdAt": "2025-01-11T01:19:09.000Z",
    "title": null,
    "archived": false,
    "favourited": false,
    "taggingStatus": "success",
    "note": null,
    "summary": null,
    "tags": [{}],
    "content": {
      "type": "asset",
      "assetType": "pdf",
      "assetId": "b81dfb4a-cfb7-466f-bd27-7cd85f0c2741",
      "fileName": "1706.03762",
      "sourceUrl": "https://arxiv.org/pdf/1706.03762"
    },
    "assets": [
      {
        "id": "b81dfb4a-cfb7-466f-bd27-7cd85f0c2741",
        "assetType": "bookmarkAsset"
      }
    ]
  },
  {
    "id": "ss5bifixtn4o8pyqf3ke375e",
    "createdAt": "2025-01-12T01:29:42.000Z",
    "title": null,
    "archived": false,
    "favourited": false,
    "taggingStatus": "success",
    "note": null,
    "summary": null,
    "tags": [{}],
    "content": {
      "type": "link",
      "url": "https://magazine.sebastianraschka.com/p/understanding-large-language-models",
      "title": "Understanding Large Language Models",
      "description": "Explore the transformative power of large language models in AI. Dive into a curated reading list for ML enthusiasts. Discover the impact of transformers on NLP, vision, and biology.",
      "imageUrl": "...",
      "imageAssetId": "23500314-1405-4356-ab71-b763c02ce119",
      "favicon": "...",
      "htmlContent": "...",
      "crawledAt": "2025-01-12T01:29:46.000Z"
    },
    "assets": [
      {
        "id": "23500314-1405-4356-ab71-b763c02ce119",
        "assetType": "bannerImage"
      }
    ]
  }
]

Device Details

No response

Exact Hoarder Version

0.21.0

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions