-
Notifications
You must be signed in to change notification settings - Fork 236
Initial UnixFS specification #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
53dbc07
to
64c86d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Jorropo! I know how time-consuming spelunking code and writing specs is, this is effort is very appreciated ❤️
Did a first pass read and dropped some comments – mostly questions about things I did not know and trying to confirm I understood them right + flagging sections we should research or reorder / rephrase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad this is moving along 🎉.
Spec documents where things unrelated to the spec such as implementation details, alternatives considered, etc. are intertwined with the spec such that they're hard to distinguish is painful. The UnixFS spec is confusing enough without these extra distractions, many of which came from the previous version of the spec, so let's drop them. If people want to keep them around then moving them somewhere separate (e.g. to an appendix) would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some initial writing suggestions here, please lmk if it's helpful @Jorropo. I didn't get to everything. I can make another pass shortly. I know this is a work-in-progress.
UNIXFS.md
Outdated
- JavaScript | ||
- Data Formats - [unixfs](https://github.com/ipfs/js-ipfs-unixfs) | ||
- Importer - [unixfs-importer](https://github.com/ipfs/js-ipfs-unixfs-importer) | ||
- Exporter - [unixfs-exporter](https://github.com/ipfs/js-ipfs-unixfs-exporter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add
https://github.com/ipld/js-unixfs
What about https://github.com/web3-storage/fast-unixfs-exporter
I just learned of these in the last couple of days and from what I understand they are actively maintained and used by DAGHouse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/ipld/js-unixfs ✅ - yes we use this for encoding UnixFS DAGs now
https://github.com/web3-storage/fast-unixfs-exporter ❌ - temporary fork now deprecated
@ElPaisano : are you able to review, particularly for a grammar/organization regard? |
- annotate which fields are required for specific DataTypes - clarify fanout is required for HAMTShard despite optional in schema - specify when implementations should include Tsize values - document Data field usage for each type - mark MimeType as reserved for future use addresses review comments: ipfs#331 (comment) ipfs#331 (comment)
Moves implementation-specific block size guidance to appendix section. Keeps spec focused on format while providing practical guidance separately. Related to PR ipfs#331 review feedback
Filesize is mandatory for Type=File and Type=Raw, defaults to 0 if omitted. While marked optional in protobuf for compatibility with other types. Addresses review feedback about filesize field requirements
Mode and mtime SHOULD NOT be included by default to preserve deduplication. Must be explicitly requested by user. ipfs#331 (comment)
- Fanout SHOULD be multiple of 8 for byte-aligned bitfield - Bitfield MUST be written, SHOULD be read for efficiency - Document common sharding threshold (256 KiB - 1 MiB) - Clarify bitfield is little-endian, size is fanout/8 bytes Based on analysis of boxo, go-unixfsnode, and JS implementations ipfs#331 (comment)
Remove optional/required keywords for proto3 compatibility. Add comments for semantically required fields that must be validated at the application layer since proto3 doesn't support required fields. ipfs#331 (comment)
Applied text suggestions from PR ipfs#331 review comments: - ipfs#331 (comment) - ipfs#331 (comment) - ipfs#331 (comment) - ipfs#331 (comment) Note: Suggestions r1039081771, r1039082139, r1039082250 were for the old UNIXFSv1.md file and don't apply to the current unixfs.md which has already been revised.
Verified through code inspection that symlinks MUST NOT have children in PBNode.Links. Removed the TODO comment and made the requirement explicit. Addresses: ipfs#331 (comment)
Added note that single-block files SHOULD prefer raw codec (0x55) over dag-pb for canonical CID to avoid protobuf overhead. Addresses: ipfs#331 (comment)
Added note explaining that path resolution is mostly IPFS semantics over UnixFS, except for types like HAMTDirectory where the resolution algorithm is part of the data structure itself. Addresses: ipfs#331 (comment)
Simplified MUST requirements, clarified fanout MUST be multiple of 8, and added clearer guidance on choosing fanout values (256 vs 1024) with their trade-offs. Addresses: ipfs#331 (comment)
Added clarification that child blocks in multi-block files can be either raw blocks (0x55) or dag-pb blocks (0x70). Addresses: ipfs#331 (comment)
adds rationale for the specific sorting choice - universal and locale-independent ordering across implementations ipfs#331 (comment)
specifies that duplicates are not allowed but readers must handle them gracefully by returning first match, following robustness principle. writers must clean up duplicates when mutating. resolves inline TODO at line 322
removes TODO and documents that POSIX explicitly prohibits zero-length filenames resolves inline TODO at line 684
rust-ipfs was archived in 2022 and unixfsv1 is not actively maintained, confirming they should remain hidden resolves inline TODO at line 1040
fixes markdown linting issues found by superlinter: - removed trailing spaces from lines 259, 319-324 - removed extra blank line at line 1040
documents commonly encountered empty structures including: - empty dag-pb directories (CIDv0, CIDv1, inlined) - empty dag-pb files (CIDv0, CIDv1) - empty raw blocks (CIDv1, inlined) these CIDs are frequently hardcoded in implementations for performance optimization
moved Jorropo to former_editors as he authored the comprehensive UnixFS spec rewrite in PR ipfs#331 added contributors from PR ipfs#331 review process to thanks section, ordered chronologically by first contribution to ipfs ecosystem touching UnixFS, including useful code comments that helped shaping this spec: - earliest contributions to ipfs/specs repo (2015-2021) - UnixFS implementation work in kubo/boxo (2019-2020) - PR ipfs#331 reviews and feedback (2022-2025)
moved Jorropo to former_editors as he authored the comprehensive UnixFS spec rewrite in PR ipfs#331 added contributors from PR ipfs#331 review process to thanks section, ordered chronologically by first contribution to ipfs ecosystem touching UnixFS, including useful code comments that helped shaping this spec: - earliest contributions to ipfs/specs repo (2015-2021) - UnixFS implementation work in kubo/boxo (2019-2020) - PR ipfs#331 reviews and feedback (2022-2025)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The (Great) Spec of UnixFS" by Pieter Bruegel the Elder (1563)
Thanks everyone who contributed to this PR over the past 2+ years, but also worked on UnixFS implementations in the past. Special thanks to @Jorropo for doing the heavy lifting - writing the initial version and spelunking through legacy implementations to document how UnixFS actually works.
Did my best to incorporate the feedback and added real-world test vectors from the gateway-conformance project. These test vectors should help when the spec doesn't cover every edge case.
This likely isn't perfect, but we're merging as-is to avoid letting this rot further. If you spot issues or have improvements, please open a new PR - much easier to review focused changes than revisiting this massive rewrite.
Merging. The spec is marked as "draft" status, so we expect iterations. Looking forward to your PRs!
I've been quite away from this, I'm glad to see all the work you peoples have done on this PR. |
This need some touch up, (Table of Content, fixtures, ...) but I would to gather feedback first.
cc #162 #316