Open
Conversation
Contributor
🚀 Build Preview on IPFS ready
|
41b88fb to
38077b0
Compare
2color
commented
Mar 6, 2026
|
|
||
| The goal is to have a single content hash that represents a directory of files, such that verifying that hash verifies the entire contents. | ||
|
|
||
| This matters for build outputs, software distributions, large datasets, website archives — any case where you need to verify that a collection of files hasn't changed. A naive approach like hashing a tarball is fragile: tar archives encode metadata (timestamps, permissions, ordering) that vary between machines, producing different hashes for identical file contents. Content addressing solves this, but the choice of format has real consequences — particularly for overhead, determinism, language support and existing tooling, and whether you can fetch subsets without downloading the whole thing. These differences compound as dataset size grows: what's negligible at megabyte scale — a few extra bytes of framing, an extra round of parsing per block — becomes a meaningful cost at terabyte scale across millions of files. |
Member
Author
There was a problem hiding this comment.
Add note here about this creating a tar isn't practical for large datasets where you cannot afford to store two copies.
Member
Author
|
vmx
reviewed
Mar 9, 2026
Comment on lines
+126
to
+128
| │ "assets/style.css" │ each string is varint-length | ||
| │ "js/app.js" │ prefixed + raw UTF-8 bytes | ||
| │ "index.html" │ |
Member
There was a problem hiding this comment.
The right edge is one space off. Same on the diagram below.
| ### Characteristics | ||
|
|
||
| - **No metadata pollution.** Unlike tar/zip, there are no timestamps, permissions, or ownership fields. Two directories with identical file names and contents always produce the same hash, regardless of when or where they were produced. | ||
| - **Positional, tag-free encoding.** Postcard serializes fields in declaration order with no field numbers or type tags. The `"CollectionV0."` magic header handles versioning. |
Contributor
There was a problem hiding this comment.
[LanguageTool] reported by reviewdog 🐶
“Hash” is a singular noun. It appears that the verb form is incorrect. (PCT_SINGULAR_NOUN_PLURAL_VERB_AGREEMENT[1])
Suggestions: fetches
Rule: https://community.languagetool.org/rule/show/PCT_SINGULAR_NOUN_PLURAL_VERB_AGREEMENT?lang=en-US&subId=1
Category: GRAMMAR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
This is a new comparison guide of UnixFS, iroh collections, and DASL/MASL for content addressing directories of files, covering overhead, determinism, subsetting, and ecosystem support.
After spending a lot of time comparing the three for a different use case, I thought it would be useful to share these insights, and embrace the plurality of the IPFS ecosystem.
Checklist before merging