Skip to content

Add guide about content addressing folder #2262

Open
2color wants to merge 6 commits intomainfrom
content-addressing-folders
Open

Add guide about content addressing folder #2262
2color wants to merge 6 commits intomainfrom
content-addressing-folders

Conversation

@2color
Copy link
Member

@2color 2color commented Mar 6, 2026

Describe your changes

This is a new comparison guide of UnixFS, iroh collections, and DASL/MASL for content addressing directories of files, covering overhead, determinism, subsetting, and ecosystem support.

After spending a lot of time comparing the three for a different use case, I thought it would be useful to share these insights, and embrace the plurality of the IPFS ecosystem.

Checklist before merging

  • Passing all required checks (The beta Check Markdown links for modified files check is not required)

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

🚀 Build Preview on IPFS ready

@2color 2color force-pushed the content-addressing-folders branch from 41b88fb to 38077b0 Compare March 6, 2026 13:36

The goal is to have a single content hash that represents a directory of files, such that verifying that hash verifies the entire contents.

This matters for build outputs, software distributions, large datasets, website archives — any case where you need to verify that a collection of files hasn't changed. A naive approach like hashing a tarball is fragile: tar archives encode metadata (timestamps, permissions, ordering) that vary between machines, producing different hashes for identical file contents. Content addressing solves this, but the choice of format has real consequences — particularly for overhead, determinism, language support and existing tooling, and whether you can fetch subsets without downloading the whole thing. These differences compound as dataset size grows: what's negligible at megabyte scale — a few extra bytes of framing, an extra round of parsing per block — becomes a meaningful cost at terabyte scale across millions of files.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add note here about this creating a tar isn't practical for large datasets where you cannot afford to store two copies.

@2color
Copy link
Member Author

2color commented Mar 6, 2026

  • Add more references to other pages like the lifecycle
  • add a whole opening section about the more abstract notion of a Merkle proof with link to protoschool. Use analogy of club bouncer who has the root hash and each person who has proof of name with real world ID, and the Merkle path to the root.

@2color 2color requested review from lidel and vmx March 6, 2026 18:53
Comment on lines +126 to +128
│ "assets/style.css" │ each string is varint-length
│ "js/app.js" │ prefixed + raw UTF-8 bytes
│ "index.html" │
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right edge is one space off. Same on the diagram below.

### Characteristics

- **No metadata pollution.** Unlike tar/zip, there are no timestamps, permissions, or ownership fields. Two directories with identical file names and contents always produce the same hash, regardless of when or where they were produced.
- **Positional, tag-free encoding.** Postcard serializes fields in declaration order with no field numbers or type tags. The `"CollectionV0."` magic header handles versioning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LanguageTool] reported by reviewdog 🐶
“Hash” is a singular noun. It appears that the verb form is incorrect. (PCT_SINGULAR_NOUN_PLURAL_VERB_AGREEMENT[1])
Suggestions: fetches
Rule: https://community.languagetool.org/rule/show/PCT_SINGULAR_NOUN_PLURAL_VERB_AGREEMENT?lang=en-US&subId=1
Category: GRAMMAR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants