-
Notifications
You must be signed in to change notification settings - Fork 132
fix(verifcid): enforce size limit for identity CIDs #1018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## main #1018 +/- ##
==========================================
+ Coverage 60.51% 60.59% +0.08%
==========================================
Files 268 268
Lines 33487 33616 +129
==========================================
+ Hits 20264 20370 +106
- Misses 11543 11560 +17
- Partials 1680 1686 +6
... and 8 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Limit identity CIDs to 128 bytes maximum to prevent abuse. Identity CIDs inline data directly and without a limit could embed arbitrary amounts, potentially being used by poorly written clients to echo back big amounts of unnecessary data. The limit matches the existing MaxDigestSize for cryptographic functions. Identity CIDs lack integrity verification and should only be used in controlled contexts like raw leaves when the size of data is smaller than a regular CID. BREAKING CHANGE: CIDs with identity multihash digests longer than MaxDigestSize (128 bytes) now produce ErrAboveMaxDigestSize error. Also renamed errors for consistency (old names deprecated but still work): - ErrBelowMinimumHashLength → ErrBelowMinDigestSize - ErrAboveMaximumHashLength → ErrAboveMaxDigestSize
8fb3910
to
f484458
Compare
verifcid/cid.go
Outdated
// MinDigestSize is the minimum size for hash digests (except for identity hashes) | ||
MinDigestSize = 20 | ||
// MaxDigestSize is the maximum size for all digest types (including identity) | ||
MaxDigestSize = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutable globals generally don't feel good, particularly at a library layer (e.g. it hurts composability / interactions between multiple projects using the library). The reason we have passable Allowlist's at all is to not have globals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I'll refactor this to use Allowlist and see how that would look like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann take a look, added this to Allowlist, and moved these consts to only act as Default*
anchors that everyone can override in their own Allowlist implementation.
We also have separate verifcid.DefaultMaxIdentityDigestSize
which right now is 128, but we can make it something different than the cryptographic verifcid.DefaultMaxDigestSize
verifcid/cid.go
Outdated
// MinDigestSize is the minimum size for hash digests (except for identity hashes) | ||
MinDigestSize = 20 | ||
// MaxDigestSize is the maximum size for all digest types (including identity) | ||
MaxDigestSize = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My read of multiformats/multihash#130 (comment) and multiformats/multihash#130 (comment) was that at that time 2048 was deemed probably a safe number that was unlikely to break people. Unclear what that number is today.
128 may actually be fine too in practice, but I'd probably want to broadcast this more widely (and flag in js-multiformats as well) before merging with a smaller number like 128.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i will ping stakeholders once its ready for review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, made this PR ready for review + pinged various stakeholders (old issues, ipfs-implementers slack, etc).
- add ErrDigestTooSmall and ErrDigestTooLarge as primary errors - add ErrIdentityDigestTooLarge for identity-specific overflow - update ValidateCid to return specific error for identity overflow - keep ErrBelowMinimumHashLength and ErrAboveMaximumHashLength as deprecated aliases
- prevent creation of identity CIDs exceeding verifcid.MaxDigestSize - preserve raw node codec when modifying data under chunker threshold - convert RawNode to UnixFS structure for append operations - inherit full CID prefix from parent directories in MFS When modifying nodes with identity CIDs, automatically switch to cryptographic hash when data would exceed the size limit. RawNodes that need to grow beyond a single block are converted to UnixFS file structures with the original raw data as the first leaf. Fixes ipfs/kubo#6011
- introduce MaxIdentityDigestSize for identity-specific size limits - update all identity CID checks to use new constant - improve safePrefixForSize logic to only check size for identity hashes This provides clearer semantic distinction between general digest limits and identity-specific limits, making the code more maintainable and future-proof.
addresses pr feedback by converting formatted errors to sentinel errors and adding private helper functions that provide detailed error context - sentinel errors are immutable and can be checked with errors.Is() - helper functions add size details when validation fails - maintains backward compatibility with error checking - update gateway test to match new error format Signed-off-by: Marcin Rataj <lidel@lidel.org>
11136c9
to
9b014be
Compare
- expand Allowlist interface with MinDigestSize/MaxDigestSize methods - support different size limits per hash function type - identity CIDs use separate DefaultMaxIdentityDigestSize limit - cryptographic hashes use DefaultMaxDigestSize limit - replace magic numbers with named Default* constants throughout - update bitswap to reference verifcid defaults
Couple more references that contribute to the size discussion, as long as we're documenting some of this in one place:
FWIW 128 seems reasonable to me. I'm happy to have them squished down their happy-path use to a size where they're more likely being used for their size-saving utility rather than some other hacky purpose that stretches the whole stack. The higher you go, the harder it is to justify their use IMO. |
I'm not a fan of large identity CIDs. They are a special case (and a pain to deal with within a code base) and are mostly used for things that should be solved differently. I see the use case of the data is smaller than the hashed value. So for me, the smaller the value the better. Hence 128 bytes sound reasonable to me. |
blockservice already validates CIDs, no need to duplicate in gateway per @aschmahmann feedback on #1018
// DefaultMinDigestSize is the default minimum size for hash digests (except for identity hashes) | ||
DefaultMinDigestSize = 20 | ||
// DefaultMaxDigestSize is the default maximum size for cryptographic hash digests. | ||
// This does not apply to identity hashes which are not cryptographic and use DefaultMaxIdentityDigestSize instead. | ||
DefaultMaxDigestSize = 128 | ||
// DefaultMaxIdentityDigestSize is the default maximum size for identity CID digests. | ||
// Identity CIDs (with multihash code 0x00) are not cryptographic hashes - they embed | ||
// data directly in the CID. This separate limit prevents abuse while allowing | ||
// different size constraints than cryptographic digests. | ||
DefaultMaxIdentityDigestSize = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achingbrain would these work for Helia? I'd like to have this documented at https://specs.ipfs.tech/ at some point, but first we need to align GO and JS implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks fine at first glance 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some tiny code style suggestions for a few less lines of code.
- fail instead of skip when test setup is broken - simplify makeTestData helper function - use bytes.Repeat for single-character byte slices addresses PR review: #1018 (comment) #1018 (comment) #1018 (comment)
combined CHANGELOG entries for both bitswap changes
Ok, seems everyone received this proposal positively, and no concerns were raised in last 2 weeks. |
* fix: enforce identity CID size limits - validate --inline-limit against verifcid.MaxDigestSize - add error when --hash=identity exceeds size limit - add tests for identity CID overflow scenarios - update help text to show maximum inline limit This prevents creation of unbounded identity CIDs by enforcing the 128-byte limit defined in ipfs/boxo#1018 Fixes #6011 IPIP: ipfs/specs#512
Important
Feedback on the default in that PR would be appreciated – it will likely act as ecosystem-wide lowest-common-denominator.
This PR aims to kick-off discussion about big identity CID that currently are allowed by the
boxo/verifcid
check and can be loaded by `boxo/gateway.It also aims to address ipfs/kubo#6011.
Problem
Identity CIDs (multihash code 0x00) are unique in that they inline data directly into the CID itself rather than hashing it. While this can be useful for very small data that would be larger to reference than to inline, the lack of size limits created several problems:
DagModifier
preserves the original CID prefix (including identity hash type) which means we can end up with a really big CIDHistorical discussions:
This PR
boxo/verifcid.DefaultMaxDigestSize
that was already applied to digests produced by supported cryptographic functions.identity
multihashes. But we need SOME OFFICIAL LIMIT, at least forboxo/gateway
andDagModifier
. Feedback welcome.DagModifier
respectsboxo/verifcid.DefaultMaxIdentityDigestSize
and switches to non-identity on mutation that would overflow the limitipfs add
, only the nicheipfs files write
which acts as append operation or people who use boxo to manually mutate DAGs that already had inlined CIDs.boxo/verifcid.DefaultMaxIdentityDigestSize
, then inherits parent CIDs hash function, or the default.Changes
boxo/verifcid.DefaultMaxIdentityDigestSize
now produceErrAboveMaxDigestSize
error.MaxDigestSize
public and reused inboxo/bitswap
boxo/verifcid
Alternative
We could also limit this to
boxo/gateway
, and enforce identity digest length to 128 there, while allowing it everywhere else. I think this is leaving surface for abuse and bugs, better to be consistent across the stack, but happy to be convinced otherwise.