Skip to content

Improve data onboarding speed: ipfs add and ipfs dag import|export #10383

@mishmosh

Description

@mishmosh

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

This is a followup on user @endomorphosis's comment in the Filecoin community discussions about IPFS hashing being slow.

I noticed that when trying to index large ML models that the IPFS daemon hashing seems to be single threaded, and therefore somewhat slow when indexing large files. If this is funded, it is my hope that someone in your org can try to create a new spec, to parallelize the hashing of large files.

Per @lidel in an ipfs-steering conversation on 2 April 2024:

In my mind this is not about inventing new hashing specifications, this is about making the most popular implementation majority of ecosystem uses for data onboarding (Kubo) better. My translation:

the IPFS daemon hashing [..] slow when indexing large files
→ Kubo's commands like ipfs add are not as fast as they "should be", when comparing with sha256sum over the number of chunks

parallelize the hashing of large files.
→ improve implementation, make core commands like ipfs dag import|export and ipfs add as fast as possible (we know they are not)

Once we have reference implementation, we can add some rules of thumb how to implement UnixFs hashing and chunking to "notes for implementers" section of wip Unix specification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium: Good to have, but can wait until someone steps upeffort/weeksEstimated to take multiple weeksexp/expertHaving worked on the specific codebase is importantkind/enhancementA net-new feature or improvement to an existing feature

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions