Skip to content

Conversation

lidel
Copy link
Member

@lidel lidel commented Aug 28, 2025

Important

Feedback on the default in that PR would be appreciated – it will likely act as ecosystem-wide lowest-common-denominator.

This PR aims to kick-off discussion about big identity CID that currently are allowed by the boxo/verifcid check and can be loaded by `boxo/gateway.

It also aims to address ipfs/kubo#6011.

Problem

Identity CIDs (multihash code 0x00) are unique in that they inline data directly into the CID itself rather than hashing it. While this can be useful for very small data that would be larger to reference than to inline, the lack of size limits created several problems:

  1. Abuse potential: Without limits, identity CIDs could embed arbitrary amounts of data, effectively turning CIDs from content addresses into data containers.
  2. Gateway resource waste: Poorly written clients could encode large payloads as identity CIDs and echo them through gateways, consuming bandwidth and resources without providing value.
  3. Unclear boundaries: The ecosystem lacked clear guidelines on when identity CIDs are appropriate, leading to potential misuse. We see people building on shaky foundations "because it works" (example)
  4. Security concerns: Identity CIDs provide no integrity verification (vulnerable to bit flips), making large identity CIDs particularly risky.
  5. Carefully audit and reason through inline CIDs kubo#6011 👈 this is a real BUG: DagModifier preserves the original CID prefix (including identity hash type) which means we can end up with a really big CID

Historical discussions:

This PR

  • Limit identity CIDs to 128 bytes maximum to prevent abuse.
    • The limit of 128 bytes matches the existing boxo/verifcid.DefaultMaxDigestSize that was already applied to digests produced by supported cryptographic functions.
    • 👉 The value 128 aims to kick-off discussion. If this is too low, we could introduce separate, higher limit just for identity multihashes. But we need SOME OFFICIAL LIMIT, at least for boxo/gateway and DagModifier. Feedback welcome.
  • DagModifier respects boxo/verifcid.DefaultMaxIdentityDigestSize and switches to non-identity on mutation that would overflow the limit
    • Note this is relatively niche fix. It does not impact ipfs add, only the niche ipfs files write which acts as append operation or people who use boxo to manually mutate DAGs that already had inlined CIDs.

Changes

  • BREAKING CHANGE: CIDs with identity multihash digests longer than boxo/verifcid.DefaultMaxIdentityDigestSize now produce ErrAboveMaxDigestSize error.
  • renamed errors for consistency
  • made MaxDigestSize public and reused in boxo/bitswap
  • additional regression tests in blockstore an gateway just to be sure limit is respected even if we ever refactor or remove boxo/verifcid

Alternative

We could also limit this to boxo/gateway, and enforce identity digest length to 128 there, while allowing it everywhere else. I think this is leaving surface for abuse and bugs, better to be consistent across the stack, but happy to be convinced otherwise.

Copy link

codecov bot commented Aug 28, 2025

Codecov Report

❌ Patch coverage is 70.07299% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.59%. Comparing base (971bdfe) to head (b497a55).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ipld/unixfs/mod/dagmodifier.go 76.31% 14 Missing and 4 partials ⚠️
mfs/file.go 12.50% 13 Missing and 1 partial ⚠️
verifcid/allowlist.go 78.57% 4 Missing and 2 partials ⚠️
verifcid/cid.go 82.35% 2 Missing and 1 partial ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1018      +/-   ##
==========================================
+ Coverage   60.51%   60.59%   +0.08%     
==========================================
  Files         268      268              
  Lines       33487    33616     +129     
==========================================
+ Hits        20264    20370     +106     
- Misses      11543    11560      +17     
- Partials     1680     1686       +6     
Files with missing lines Coverage Δ
bitswap/options.go 42.85% <ø> (ø)
bitswap/server/internal/decision/engine.go 92.14% <ø> (ø)
bitswap/server/server.go 57.20% <ø> (ø)
verifcid/cid.go 88.00% <82.35%> (-12.00%) ⬇️
verifcid/allowlist.go 73.07% <78.57%> (+6.41%) ⬆️
mfs/file.go 58.16% <12.50%> (-4.06%) ⬇️
ipld/unixfs/mod/dagmodifier.go 61.47% <76.31%> (+2.26%) ⬆️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Limit identity CIDs to 128 bytes maximum to prevent abuse. Identity CIDs
inline data directly and without a limit could embed arbitrary amounts,
potentially being used by poorly written clients to echo back big amounts
of unnecessary data.

The limit matches the existing MaxDigestSize for cryptographic functions.
Identity CIDs lack integrity verification and should only be used in
controlled contexts like raw leaves when the size of data is smaller than
a regular CID.

BREAKING CHANGE: CIDs with identity multihash digests longer than
MaxDigestSize (128 bytes) now produce ErrAboveMaxDigestSize error.

Also renamed errors for consistency (old names deprecated but still work):
- ErrBelowMinimumHashLength → ErrBelowMinDigestSize
- ErrAboveMaximumHashLength → ErrAboveMaxDigestSize
@lidel lidel force-pushed the feat/limit-identity-cid-size branch from 8fb3910 to f484458 Compare August 28, 2025 23:38
@lidel lidel mentioned this pull request Aug 29, 2025
30 tasks
verifcid/cid.go Outdated
Comment on lines 12 to 15
// MinDigestSize is the minimum size for hash digests (except for identity hashes)
MinDigestSize = 20
// MaxDigestSize is the maximum size for all digest types (including identity)
MaxDigestSize = 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable globals generally don't feel good, particularly at a library layer (e.g. it hurts composability / interactions between multiple projects using the library). The reason we have passable Allowlist's at all is to not have globals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'll refactor this to use Allowlist and see how that would look like

Copy link
Member Author

@lidel lidel Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann take a look, added this to Allowlist, and moved these consts to only act as Default* anchors that everyone can override in their own Allowlist implementation.

We also have separate verifcid.DefaultMaxIdentityDigestSize which right now is 128, but we can make it something different than the cryptographic verifcid.DefaultMaxDigestSize

verifcid/cid.go Outdated
Comment on lines 12 to 15
// MinDigestSize is the minimum size for hash digests (except for identity hashes)
MinDigestSize = 20
// MaxDigestSize is the maximum size for all digest types (including identity)
MaxDigestSize = 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read of multiformats/multihash#130 (comment) and multiformats/multihash#130 (comment) was that at that time 2048 was deemed probably a safe number that was unlikely to break people. Unclear what that number is today.

128 may actually be fine too in practice, but I'd probably want to broadcast this more widely (and flag in js-multiformats as well) before merging with a smaller number like 128.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i will ping stakeholders once its ready for review

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, made this PR ready for review + pinged various stakeholders (old issues, ipfs-implementers slack, etc).

lidel added 3 commits August 29, 2025 21:06
- add ErrDigestTooSmall and ErrDigestTooLarge as primary errors
- add ErrIdentityDigestTooLarge for identity-specific overflow
- update ValidateCid to return specific error for identity overflow
- keep ErrBelowMinimumHashLength and ErrAboveMaximumHashLength as deprecated aliases
- prevent creation of identity CIDs exceeding verifcid.MaxDigestSize
- preserve raw node codec when modifying data under chunker threshold
- convert RawNode to UnixFS structure for append operations
- inherit full CID prefix from parent directories in MFS

When modifying nodes with identity CIDs, automatically switch to
cryptographic hash when data would exceed the size limit. RawNodes
that need to grow beyond a single block are converted to UnixFS
file structures with the original raw data as the first leaf.

Fixes ipfs/kubo#6011
- introduce MaxIdentityDigestSize for identity-specific size limits
- update all identity CID checks to use new constant
- improve safePrefixForSize logic to only check size for identity hashes

This provides clearer semantic distinction between general digest
limits and identity-specific limits, making the code more maintainable
and future-proof.
addresses pr feedback by converting formatted errors to sentinel errors
and adding private helper functions that provide detailed error context

- sentinel errors are immutable and can be checked with errors.Is()
- helper functions add size details when validation fails
- maintains backward compatibility with error checking
- update gateway test to match new error format

Signed-off-by: Marcin Rataj <lidel@lidel.org>
@lidel lidel force-pushed the feat/limit-identity-cid-size branch from 11136c9 to 9b014be Compare August 30, 2025 02:44
lidel added 2 commits August 30, 2025 18:29
- expand Allowlist interface with MinDigestSize/MaxDigestSize methods
- support different size limits per hash function type
- identity CIDs use separate DefaultMaxIdentityDigestSize limit
- cryptographic hashes use DefaultMaxDigestSize limit
- replace magic numbers with named Default* constants throughout
- update bitswap to reference verifcid defaults
@rvagg
Copy link
Member

rvagg commented Sep 1, 2025

Couple more references that contribute to the size discussion, as long as we're documenting some of this in one place:

FWIW 128 seems reasonable to me. I'm happy to have them squished down their happy-path use to a size where they're more likely being used for their size-saving utility rather than some other hacky purpose that stretches the whole stack. The higher you go, the harder it is to justify their use IMO.

@vmx
Copy link
Member

vmx commented Sep 1, 2025

I'm not a fan of large identity CIDs. They are a special case (and a pain to deal with within a code base) and are mostly used for things that should be solved differently. I see the use case of the data is smaller than the hashed value. So for me, the smaller the value the better. Hence 128 bytes sound reasonable to me.

blockservice already validates CIDs, no need to duplicate in gateway

per @aschmahmann feedback on #1018
@lidel lidel marked this pull request as ready for review September 2, 2025 01:44
@lidel lidel requested a review from a team as a code owner September 2, 2025 01:44
Comment on lines +12 to +21
// DefaultMinDigestSize is the default minimum size for hash digests (except for identity hashes)
DefaultMinDigestSize = 20
// DefaultMaxDigestSize is the default maximum size for cryptographic hash digests.
// This does not apply to identity hashes which are not cryptographic and use DefaultMaxIdentityDigestSize instead.
DefaultMaxDigestSize = 128
// DefaultMaxIdentityDigestSize is the default maximum size for identity CID digests.
// Identity CIDs (with multihash code 0x00) are not cryptographic hashes - they embed
// data directly in the CID. This separate limit prevents abuse while allowing
// different size constraints than cryptographic digests.
DefaultMaxIdentityDigestSize = 128
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achingbrain would these work for Helia? I'd like to have this documented at https://specs.ipfs.tech/ at some point, but first we need to align GO and JS implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine at first glance 👍

@gammazero gammazero self-requested a review September 2, 2025 16:49
Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some tiny code style suggestions for a few less lines of code.

- fail instead of skip when test setup is broken
- simplify makeTestData helper function
- use bytes.Repeat for single-character byte slices

addresses PR review:
#1018 (comment)
#1018 (comment)
#1018 (comment)
combined CHANGELOG entries for both bitswap changes
@lidel
Copy link
Member Author

lidel commented Sep 9, 2025

Ok, seems everyone received this proposal positively, and no concerns were raised in last 2 weeks.
I'm going to merge this and we will ship Kubo 0.38 RC1 with this to solicit feedback from early testers.

@lidel lidel merged commit e69f67e into main Sep 9, 2025
21 of 23 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in @lidel's IPFS wishlist Sep 9, 2025
@lidel lidel deleted the feat/limit-identity-cid-size branch September 9, 2025 17:02
lidel added a commit to ipfs/kubo that referenced this pull request Sep 9, 2025
* fix: enforce identity CID size limits

- validate --inline-limit against verifcid.MaxDigestSize
- add error when --hash=identity exceeds size limit
- add tests for identity CID overflow scenarios
- update help text to show maximum inline limit

This prevents creation of unbounded identity CIDs by enforcing
the 128-byte limit defined in ipfs/boxo#1018

Fixes #6011
IPIP: ipfs/specs#512
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants