Skip to content

Conversation

Jorropo
Copy link
Contributor

@Jorropo Jorropo commented Oct 10, 2022

This need some touch up, (Table of Content, fixtures, ...) but I would to gather feedback first.

cc #162 #316

@Jorropo Jorropo self-assigned this Oct 10, 2022
@Jorropo Jorropo marked this pull request as ready for review December 2, 2022 17:57
@Jorropo Jorropo force-pushed the unixfs branch 2 times, most recently from 53dbc07 to 64c86d3 Compare December 2, 2022 17:59
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jorropo! I know how time-consuming spelunking code and writing specs is, this is effort is very appreciated ❤️

Did a first pass read and dropped some comments – mostly questions about things I did not know and trying to confirm I understood them right + flagging sections we should research or reorder / rephrase.

Copy link
Contributor

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad this is moving along 🎉.

Spec documents where things unrelated to the spec such as implementation details, alternatives considered, etc. are intertwined with the spec such that they're hard to distinguish is painful. The UnixFS spec is confusing enough without these extra distractions, many of which came from the previous version of the spec, so let's drop them. If people want to keep them around then moving them somewhere separate (e.g. to an appendix) would be great.

Copy link
Contributor

@ElPaisano ElPaisano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some initial writing suggestions here, please lmk if it's helpful @Jorropo. I didn't get to everything. I can make another pass shortly. I know this is a work-in-progress.

UNIXFS.md Outdated
- JavaScript
- Data Formats - [unixfs](https://github.com/ipfs/js-ipfs-unixfs)
- Importer - [unixfs-importer](https://github.com/ipfs/js-ipfs-unixfs-importer)
- Exporter - [unixfs-exporter](https://github.com/ipfs/js-ipfs-unixfs-exporter)
Copy link
Member

@2color 2color Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add
https://github.com/ipld/js-unixfs

What about https://github.com/web3-storage/fast-unixfs-exporter

I just learned of these in the last couple of days and from what I understand they are actively maintained and used by DAGHouse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/ipld/js-unixfs ✅ - yes we use this for encoding UnixFS DAGs now
https://github.com/web3-storage/fast-unixfs-exporter ❌ - temporary fork now deprecated

@BigLep
Copy link
Contributor

BigLep commented Jan 21, 2023

@ElPaisano : are you able to review, particularly for a grammar/organization regard?

lidel added 8 commits August 24, 2025 03:19
- annotate which fields are required for specific DataTypes
- clarify fanout is required for HAMTShard despite optional in schema
- specify when implementations should include Tsize values
- document Data field usage for each type
- mark MimeType as reserved for future use

addresses review comments:
ipfs#331 (comment)
ipfs#331 (comment)
Moves implementation-specific block size guidance to appendix section.
Keeps spec focused on format while providing practical guidance separately.

Related to PR ipfs#331 review feedback
Filesize is mandatory for Type=File and Type=Raw, defaults to 0 if omitted.
While marked optional in protobuf for compatibility with other types.

Addresses review feedback about filesize field requirements
Mode and mtime SHOULD NOT be included by default to preserve deduplication.
Must be explicitly requested by user.

ipfs#331 (comment)
- Fanout SHOULD be multiple of 8 for byte-aligned bitfield
- Bitfield MUST be written, SHOULD be read for efficiency
- Document common sharding threshold (256 KiB - 1 MiB)
- Clarify bitfield is little-endian, size is fanout/8 bytes

Based on analysis of boxo, go-unixfsnode, and JS implementations
ipfs#331 (comment)
Remove optional/required keywords for proto3 compatibility.
Add comments for semantically required fields that must be validated
at the application layer since proto3 doesn't support required fields.

ipfs#331 (comment)
lidel added 6 commits August 25, 2025 17:53
Applied text suggestions from PR ipfs#331 review comments:
- ipfs#331 (comment)
- ipfs#331 (comment)
- ipfs#331 (comment)
- ipfs#331 (comment)

Note: Suggestions r1039081771, r1039082139, r1039082250 were for
the old UNIXFSv1.md file and don't apply to the current unixfs.md
which has already been revised.
Verified through code inspection that symlinks MUST NOT have children
in PBNode.Links. Removed the TODO comment and made the requirement
explicit.

Addresses: ipfs#331 (comment)
Added note that single-block files SHOULD prefer raw codec (0x55)
over dag-pb for canonical CID to avoid protobuf overhead.

Addresses: ipfs#331 (comment)
Added note explaining that path resolution is mostly IPFS semantics
over UnixFS, except for types like HAMTDirectory where the resolution
algorithm is part of the data structure itself.

Addresses: ipfs#331 (comment)
Simplified MUST requirements, clarified fanout MUST be multiple of 8,
and added clearer guidance on choosing fanout values (256 vs 1024)
with their trade-offs.

Addresses: ipfs#331 (comment)
lidel added a commit to Jorropo/specs that referenced this pull request Aug 25, 2025
Added clarification that child blocks in multi-block files can be
either raw blocks (0x55) or dag-pb blocks (0x70).

Addresses: ipfs#331 (comment)
adds rationale for the specific sorting choice - universal and
locale-independent ordering across implementations

ipfs#331 (comment)
lidel added 5 commits August 25, 2025 22:15
specifies that duplicates are not allowed but readers must handle
them gracefully by returning first match, following robustness
principle. writers must clean up duplicates when mutating.

resolves inline TODO at line 322
removes TODO and documents that POSIX explicitly prohibits
zero-length filenames

resolves inline TODO at line 684
rust-ipfs was archived in 2022 and unixfsv1 is not actively
maintained, confirming they should remain hidden

resolves inline TODO at line 1040
fixes markdown linting issues found by superlinter:
- removed trailing spaces from lines 259, 319-324
- removed extra blank line at line 1040
documents commonly encountered empty structures including:
- empty dag-pb directories (CIDv0, CIDv1, inlined)
- empty dag-pb files (CIDv0, CIDv1)
- empty raw blocks (CIDv1, inlined)

these CIDs are frequently hardcoded in implementations for
performance optimization
lidel added a commit to Jorropo/specs that referenced this pull request Aug 28, 2025
moved Jorropo to former_editors as he authored the
comprehensive UnixFS spec rewrite in PR ipfs#331

added contributors from PR ipfs#331 review process to thanks section,
ordered chronologically by first contribution to ipfs ecosystem
touching UnixFS, including useful code comments that helped
shaping this spec:
- earliest contributions to ipfs/specs repo (2015-2021)
- UnixFS implementation work in kubo/boxo (2019-2020)
- PR ipfs#331 reviews and feedback (2022-2025)
moved Jorropo to former_editors as he authored the
comprehensive UnixFS spec rewrite in PR ipfs#331

added contributors from PR ipfs#331 review process to thanks section,
ordered chronologically by first contribution to ipfs ecosystem
touching UnixFS, including useful code comments that helped
shaping this spec:
- earliest contributions to ipfs/specs repo (2015-2021)
- UnixFS implementation work in kubo/boxo (2019-2020)
- PR ipfs#331 reviews and feedback (2022-2025)
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


"The (Great) Spec of UnixFS" by Pieter Bruegel the Elder (1563)

Thanks everyone who contributed to this PR over the past 2+ years, but also worked on UnixFS implementations in the past. Special thanks to @Jorropo for doing the heavy lifting - writing the initial version and spelunking through legacy implementations to document how UnixFS actually works.

Did my best to incorporate the feedback and added real-world test vectors from the gateway-conformance project. These test vectors should help when the spec doesn't cover every edge case.

This likely isn't perfect, but we're merging as-is to avoid letting this rot further. If you spot issues or have improvements, please open a new PR - much easier to review focused changes than revisiting this massive rewrite.

Merging. The spec is marked as "draft" status, so we expect iterations. Looking forward to your PRs!

@lidel lidel merged commit 4dadb31 into ipfs:main Aug 28, 2025
2 checks passed
@github-project-automation github-project-automation bot moved this from 🏃 In Progress to ✅ Ratified in IPIP pipeline Aug 28, 2025
@Jorropo
Copy link
Contributor Author

Jorropo commented Aug 28, 2025

I've been quite away from this, I'm glad to see all the work you peoples have done on this PR.
🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ratified
Status: 🔎 In Review
Development

Successfully merging this pull request may close these issues.