IPIP 0499: CID Profiles #499

mishmosh · 2025-04-03T14:03:02Z

Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID.

This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. They can be used to verify data across implementations, provide recommended settings depending on retrieval performance goals, and more.

src/ipips/ipip-0499.md

lets make the fanout match the max links from files and rename profile to `-wide` this will make it easier to discuss in ipfs/specs#499

Co-authored-by: Bumblefudge <bumblefudge@learningproof.xyz>

Import.* config params for controlling DAG width were added in: ipfs/kubo#10774

lidel · 2025-04-15T22:37:05Z

Thank you for kicking this off, and filling initial state.

I've incorporated specific "dag width" settings for File, Directory and HAMTDirectory nodes,
and updated the table to reflect state from ipfs/kubo#10774
and profiles that exist in Kubo master branch: legacy-cid-v0, test-cid-v1 and test-cid-v1-wide:

https://github.com/ipfs/kubo/blob/master/config/profile.go#L268-L307

agree what "cid-2025" profile should look like
- this will be new default in "Kubo v1.0"
- we have test-cid-v1 and test-cid-v1-wide in Kubo as potential candidates
switch to PR from local branch (so we have build preview)
figure out how to render the information (currently the table is not supported by https://github.com/ipfs/spec-generator)

src/ipips/ipip-0499.md

Co-authored-by: Christian Paul <info@jaller.de>

2color · 2025-08-12T10:28:15Z

I pushed a bunch of edits to move the conversation forward. This is sorely needed in the ecosystem, and the hope is that by building consensus we can improve developer experience when working with UnixFS and the overall health of the UnixFS ecosystem.

Feedback is always appreciated.

hsanjuan · 2025-08-18T13:15:05Z

src/ipips/ipip-0499.md

+
+This proposal introduces configuration profiles for CIDs used to represent files and directories with UnixFS. These ensure that the deterministic CID generation for the same data, regardless of the implementation.
+
+Profiles explicitly define the UnixFS parameters, e.g. dag width, hash algorithem, and chunk size, that affect the resulting CID, such that given the profile and input data different implementations will generate identical CIDs.


Suggested change

Profiles explicitly define the UnixFS parameters, e.g. dag width, hash algorithem, and chunk size, that affect the resulting CID, such that given the profile and input data different implementations will generate identical CIDs.

Profiles explicitly define the UnixFS parameters, e.g. dag width, hash algorithm, and chunk size, that affect the resulting CID, such that given the profile and input data different implementations will generate identical CIDs.

hsanjuan · 2025-08-18T13:15:28Z

src/ipips/ipip-0499.md

+This lack of determinism makes has a number of drawbacks:
+
+- It is difficult to verify content across different tools and implementations, as the same content may yield different CIDs.
+- Users are requires to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process.


Suggested change

- Users are requires to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process.

- Users are required to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process.

hsanjuan · 2025-08-18T13:16:37Z

src/ipips/ipip-0499.md

+
+By introducing profiles, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and determinism through profiles.
+
+UnixFS CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file tree can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation.


Re-iterative, typos. Last sentence does not compile in natural language.

hsanjuan · 2025-08-18T13:18:09Z

src/ipips/ipip-0499.md

+1. CID version (currently only CIDv0 or CIDv1)
+1. Hash function
+1. UnixFS chunk size
+1. UnixFS DAG layout (e.g. balanced, trickle)


Suggested change

1. UnixFS DAG layout (e.g. balanced, trickle)

1. UnixFS DAG layout (e.g. balanced, trickle etc...)

We probably need to fix the unixfs spec as it doesn't really say how the layout works. For a start, they DAG layout is not part of the unixfs "format", but rather unixfs protobufs are embedded in merkledag-pb protobufs, that could be cbors or jsons. So either we set that in stone (unixfs dag == protobuf-pb) or we need to be explicit in profiles.

Then there is the part about the subtleties of the layout. i.e. the trickle parameters, the balancing of the balanced dag etc, which are also badly specified. It may be that now you can get away with UnixFS DAG width for configuring existing DAG layouts, but I would imagine layouts with more options than the width.

hsanjuan · 2025-08-18T13:20:53Z

src/ipips/ipip-0499.md

+1. UnixFS DAG layout (e.g. balanced, trickle)
+1. UnixFS DAG width (max number of links per `File` node)
+1. `HAMTDirectory` fanout (must be a power of 2)
+1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links


If this number is dynamic based on the lengths of the actual link entries in the dag, we will need to specify what algorithm that estimation follows. I would put such things in a special "ipfs legacy" profile to be honest, along with cidv0, non-raw leaves etc. We probably should heavily discourage coming up with profiles that do weird things, like dynamically setting params or not using raw-leaves for things.

So, each layout would have its own set of layout-params:

balanced:

max-links: N

trickle:

max-leaves-per-level: N

hsanjuan · 2025-08-18T13:23:01Z

src/ipips/ipip-0499.md

+1. Whether empty directories are included in the DAG
+  - Some implementations apply filtering before merkleizing filesystem entries in the DAG.


This is weird, because then we need to consider empty files, hidden files, unreadable files, symlinks and symlink follows, so probably need to mention all those as part of the profile too?

hsanjuan · 2025-08-18T13:25:22Z

src/ipips/ipip-0499.md

+The profiles define a set of parameters that affect the resulting CID. These parameters are based on the UnixFS specification and are used to generate the CID for a given file tree. The parameters include:
+
+1. CID version (currently only CIDv0 or CIDv1)
+1. Hash function


This subtly means that the hash function will expected to be the same for all the nodes in the DAG in question. I'm not sure if that is a requirement that is written anywhere, so technically you can build unixfs DAGs with multiple hash functions (for fun right?).

hsanjuan · 2025-08-18T13:41:34Z

src/ipips/ipip-0499.md

+
+The profiles define a set of parameters that affect the resulting CID. These parameters are based on the UnixFS specification and are used to generate the CID for a given file tree. The parameters include:
+
+1. CID version (currently only CIDv0 or CIDv1)


Does this matter? Or how much? Currently, in Merkledag-PB, links are the multihashes, so in principle, the cid version used (and the multibase) just decides the final presentation of the root CID. If profiles affect only unixfs, the codec is also fixed. If we have the same multihash, the same codec, the only thing that can change is the CID-encoding base if we have one.

So if we want a profile to dictate exactly the final string representation of the root CID, we need to list "multibase". And if not, if we are happy with the profile just producing equivalent CIDs (potentially in different bases), then CID version does not fully matter.

hsanjuan · 2025-08-18T13:48:36Z

src/ipips/ipip-0499.md

+
+### Compatibility
+
+UnixFS Data encoded with the profiles defined in this IPIP is fully compatible with existing implementations, as it is fully compliant with the UnixFS specification.


Cannot be compliant with details that are not specified as of today..

hsanjuan · 2025-08-18T13:50:17Z

src/ipips/ipip-0499.md

+
+### Alternatives
+
+As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle proofs needed to verify the CID.


Suggested change

As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle proofs needed to verify the CID.

As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID.

Create ipip-0000.md

8842176

mishmosh requested a review from a team as a code owner April 3, 2025 14:03

Update and rename ipip-0000.md to ipip-0499.md

4ba68f0

mishmosh changed the title ~~Create ipip-0000.md: CID profiles~~ IPIP 0499: CID Profiles Apr 3, 2025

2color reviewed Apr 3, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

2color mentioned this pull request Apr 3, 2025

Inconsistent CID Calculation with Example: Addressing a file by CID with UnixFS ipfs/helia#765

Open

bumblefudge reviewed Apr 4, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

bumblefudge reviewed Apr 4, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

bumblefudge reviewed Apr 4, 2025

View reviewed changes

src/ipips/ipip-0499.md Show resolved Hide resolved

lidel reviewed Apr 11, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

lidel mentioned this pull request Apr 11, 2025

feat(config): ipfs add and Import options for controling UnixFS DAG Width ipfs/kubo#10774

Merged

lidel added a commit to ipfs/kubo that referenced this pull request Apr 15, 2025

refactor: test-cid-v1-wide with UnixFSHAMTDirectoryMaxFanout=1024

b08bc4d

lets make the fanout match the max links from files and rename profile to `-wide` this will make it easier to discuss in ipfs/specs#499

lidel and others added 2 commits April 15, 2025 23:41

add extra attributes proposed in review

6cc64cb

Co-authored-by: Bumblefudge <bumblefudge@learningproof.xyz>

incorporate kubo#10774

d8b8389

Import.* config params for controlling DAG width were added in: ipfs/kubo#10774

This was referenced Apr 18, 2025

Publish UnixFS specifications at specs.ipfs.tech #331

Open

Protocol stewardship and improvements — IPFS/2025 ipshipyard/roadmaps#16

Open

Merge branch 'main' into patch-1

600d1fc

BrewTestBot mentioned this pull request May 21, 2025

ipfs 0.35.0 Homebrew/homebrew-core#224309

Merged

2color reviewed May 23, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

2color reviewed May 29, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

jaller94 reviewed Jun 30, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

This comment was marked as off-topic.

Sign in to view

SethDocherty mentioned this pull request Jul 1, 2025

Difference in CID Generation between IPFS and Singularity data-preservation-programs/singularity#525

Open

2color and others added 6 commits August 12, 2025 09:21

Update src/ipips/ipip-0499.md

595588c

Co-authored-by: Christian Paul <info@jaller.de>

add daniel as editor

41f9b86

edit summary and motivation

229988f

edit summary

f37e610

edit parameters and design

7a12f0a

edit user benefit and compatibility

ff69e56

refine parameters and introduce a named profile

09baf68

2color requested review from lidel, aschmahmann, hsanjuan and rvagg August 12, 2025 10:29

hsanjuan reviewed Aug 18, 2025

View reviewed changes

2color mentioned this pull request Aug 18, 2025

IPFS hash feature use non-specified algorithm which is not widely compatible in the ecosystem ethereum/solidity#14389

Open


		This proposal introduces configuration profiles for CIDs used to represent files and directories with UnixFS. These ensure that the deterministic CID generation for the same data, regardless of the implementation.

		Profiles explicitly define the UnixFS parameters, e.g. dag width, hash algorithem, and chunk size, that affect the resulting CID, such that given the profile and input data different implementations will generate identical CIDs.

	- Users are requires to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process.
	- Users are required to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process.


		By introducing profiles, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and determinism through profiles.

		UnixFS CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file tree can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation.

	1. UnixFS DAG layout (e.g. balanced, trickle)
	1. UnixFS DAG layout (e.g. balanced, trickle etc...)

		1. Whether empty directories are included in the DAG
		- Some implementations apply filtering before merkleizing filesystem entries in the DAG.


		The profiles define a set of parameters that affect the resulting CID. These parameters are based on the UnixFS specification and are used to generate the CID for a given file tree. The parameters include:

		1. CID version (currently only CIDv0 or CIDv1)


		### Compatibility

		UnixFS Data encoded with the profiles defined in this IPIP is fully compatible with existing implementations, as it is fully compliant with the UnixFS specification.


		### Alternatives

		As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle proofs needed to verify the CID.

IPIP 0499: CID Profiles #499

Are you sure you want to change the base?

IPIP 0499: CID Profiles #499

Conversation

mishmosh commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lidel commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

2color commented Aug 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mishmosh commented Apr 3, 2025 •

edited

Loading