Skip to content

feat: introduce hybrid storage#33

Draft
mdhruvil wants to merge 12 commits intomainfrom
mdhruvil/hybrid-storage
Draft

feat: introduce hybrid storage#33
mdhruvil wants to merge 12 commits intomainfrom
mdhruvil/hybrid-storage

Conversation

@mdhruvil
Copy link
Copy Markdown
Owner

This PR rewrites the whole git backend to support R2 + D1 Hybrid Architecture

- add vitest
- add stream buffer for exact byte count reading from ReadableStream
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 23, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mdhruvil/hybrid-storage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Jan 23, 2026

Greptile Summary

This PR introduces a custom streaming Git packfile parser as the foundation for a hybrid R2 + D1 storage architecture. The parser handles packfile header parsing, object decompression, delta objects (both OFS and REF types), and SHA-1 checksum verification.

Key Changes:

  • Core Implementation: New PackfileParser class with streaming architecture using StreamBuffer for memory-efficient chunk processing
  • Hash Verification: Clever delayed-hashing strategy that excludes the trailing 20-byte checksum from hash computation
  • Testing: Comprehensive test suite that validates against real git packfiles using git verify-pack
  • CI/CD: Split CI workflow to run lint/test/typecheck before deployment approval
  • Documentation: Extensive docs on hybrid storage strategy, cost optimization (92% savings vs pure DO SQLite), and implementation roadmap
  • Cleanup: Removed unused header component and git submodules

Issues Found:

  • Critical logic bug in decompression error handling (line 366-373 in apps/web/src/git/pack/index.ts) - the condition for detecting decompression failure is mathematically incorrect and will cause unnecessary retries

Confidence Score: 3/5

  • This PR requires fixing the decompression error handling bug before merging
  • The core packfile parser implementation is well-designed with solid test coverage and proper streaming architecture. However, there's a logic bug in the decompression error detection that could cause performance issues by unnecessarily retrying decompression up to 100 times. The bug won't cause data corruption (tests validate correctness), but it could waste CPU cycles and delay error reporting. The rest of the changes (CI, docs, config) are sound.
  • Fix the decompression logic in apps/web/src/git/pack/index.ts before merging

Important Files Changed

Filename Overview
apps/web/src/git/pack/index.ts New streaming packfile parser with SHA-1 verification and delta object support; has decompression error handling bug
apps/web/src/lib/stream-buffer.ts Memory-efficient streaming buffer with chunk compaction and optional data callbacks
apps/web/src/git/pack/pack.test.ts Comprehensive test suite validating parser against actual git packfiles using git verify-pack
.github/workflows/pr-preview.yml CI workflow split into separate lint/test/typecheck job before preview deployment
docs/data-storage.md Architecture documentation for hybrid R2+D1 storage strategy with cost analysis

Sequence Diagram

sequenceDiagram
    participant Client as Git Client
    participant Worker as Cloudflare Worker
    participant Cache as Cache API
    participant DO as Durable Object (SQLite)
    participant R2 as R2 Storage
    participant Parser as PackfileParser

    Client->>Worker: git clone/fetch request
    Worker->>Cache: Check for parsed commit/tree
    alt Cache Hit
        Cache-->>Worker: Return parsed object
        Worker-->>Client: Send object
    else Cache Miss
        Worker->>DO: Query object index (oid→pack_id, offset)
        DO-->>Worker: Return pack location
        Worker->>R2: Range request for object at offset
        R2-->>Worker: Stream compressed packfile chunk
        Worker->>Parser: Create PackfileParser(stream)
        Parser->>Parser: parseHeader() - validate signature
        Parser->>Parser: parseObject() - read type/size
        Parser->>Parser: decompressObject() - inflate zlib
        Parser->>Parser: Verify SHA-1 checksum
        Parser-->>Worker: Return parsed object
        Worker->>Cache: Store parsed object (1 year TTL)
        Worker-->>Client: Send object
    end
    
    Note over Parser: StreamBuffer manages chunks<br/>Memory-efficient compaction<br/>Delayed hash computation
Loading

Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread apps/web/src/git/pack/index.ts Outdated
Comment on lines +366 to +373
if (
this.buffer.bytesRead + bufferSize >=
this._progress.bytesRead + bufferSize * 2
) {
return Result.err(
err instanceof Error ? err : new Error(String(err))
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: the condition logic is incorrect - this.buffer.bytesRead should equal this._progress.bytesRead since they're synchronized (lines 92, 104, 150), making this condition effectively bufferSize >= bufferSize * 2, which is only true when bufferSize <= 0

this means the error branch will almost never execute, and decompression will always retry up to maxAttempts even when more data won't help

Suggested change
if (
this.buffer.bytesRead + bufferSize >=
this._progress.bytesRead + bufferSize * 2
) {
return Result.err(
err instanceof Error ? err : new Error(String(err))
);
}
if (this.done && available.length < bufferSize) {
return Result.err(
err instanceof Error ? err : new Error(String(err))
);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/src/git/pack/index.ts
Line: 366:373

Comment:
**logic:** the condition logic is incorrect - `this.buffer.bytesRead` should equal `this._progress.bytesRead` since they're synchronized (lines 92, 104, 150), making this condition effectively `bufferSize >= bufferSize * 2`, which is only true when `bufferSize <= 0`

this means the error branch will almost never execute, and decompression will always retry up to `maxAttempts` even when more data won't help

```suggestion
        if (this.done && available.length < bufferSize) {
          return Result.err(
            err instanceof Error ? err : new Error(String(err))
          );
        }
```

How can I resolve this? If you propose a fix, please make it concise.

@mdhruvil mdhruvil marked this pull request as draft January 23, 2026 03:50
store git packs, objects, and refs in SQLite-backed DO
enabling faster browsing of commits, trees, and blobs
- schema: git_packs, git_object_index, git_refs tables
- drizzle migrations for DO with sql loader vite plugin
- test button on dashboard for verification
Removes ~150 lines of custom error class implementation by adopting
the better-result package, reducing maintenance burden.

Also adds node_modules exclusion to tsconfigs to fix IDE performance.
enables offloading large pack files to R2 object storage,
allowing repos to exceed Durable Object size limits
Users now see real-time progress during git push instead of silent waiting.
Packfiles stored in R2 enabling larger repo support beyond DO SQLite limits.

Temporarily disabled while migrating:
- git clone/fetch (upload-pack returns 501)
- repo browsing UI (shows placeholder data)

BREAKING CHANGE: git clone/fetch unavailable until upload-pack implemented
@mdhruvil mdhruvil closed this Feb 26, 2026
@mdhruvil mdhruvil reopened this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant