Skip to content

feat: encrypted replication for private subtrees (B1/B2/B3) for #18#36

Open
beardthelion wants to merge 57 commits into
Gitlawb:mainfrom
beardthelion:feat/encrypted-replication
Open

feat: encrypted replication for private subtrees (B1/B2/B3) for #18#36
beardthelion wants to merge 57 commits into
Gitlawb:mainfrom
beardthelion:feat/encrypted-replication

Conversation

@beardthelion

@beardthelion beardthelion commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Adds encrypted replication for path-scoped private subtrees (issue #18), so a repo can be public while a withheld subtree stays durable and recoverable by authorized readers without ever exposing plaintext to the network.

What this does

Three layers, each building on the last:

  • B1: encrypt-then-pin. On push, withheld blobs are sealed as XChaCha20-Poly1305 envelopes to the recipient DIDs and pinned to IPFS instead of being dropped. gl clone transparently recovers and installs the blobs a reader is authorized to decrypt; everyone else just sees the public tree.
  • B2: peer-carried replication. Peer mirrors fetch, pin, and re-serve the encrypted envelopes, so durability no longer depends on the origin node staying up.
  • B3: Arweave manifest anchor. Each push anchors a small manifest (oid → cid → recipients) to Arweave via Irys. If a node loses its database entirely, gl reconstructs the recovery index from public Arweave + IPFS gateways and still restores the authorized blobs. Recovery is bounded by a 30s timeout per gateway.

Plaintext for a withheld path is never replicated. Non-recipients can reach the envelope and the manifest but cannot decrypt anything.

Dependencies

This builds on the path-scoped visibility stack and does not stand alone. It expects these to land first:

Because none of those are merged yet, the diff here carries their changes too. The new work in this PR is the encrypted-replication commits (core envelope crypto, encrypted_pin, the encrypted-blobs endpoints, arweave manifest anchoring, and the gl recovery path). It will shrink to just that once the prerequisites merge.

Tests

Hermetic coverage for the new paths: envelope seal/open round-trips, recipient resolution, manifest anchoring and parsing (mocked Irys/GraphQL), and read-path clone recovery over mocked Arweave + IPFS gateways for both an authorized recipient and a non-recipient. Full suites green.

Live end-to-end across two nodes with Kubo + Irys is the remaining manual step before this is production-ready.

Summary by CodeRabbit

  • New Features
    • Encrypted "withheld" blob support with per-recipient recovery and replication.
    • Server endpoints to list, fetch, and replicate encrypted blobs and to expose withheld-path globs.
    • Partial "clean" clone (gl clone) that excludes withheld paths and supports authorized recovery via node or Arweave/IPFS manifests.
    • Mirror/sync and pinning flows updated to replicate encrypted envelopes and respect withheld visibility.

…al clone

upload_pack_excluding emitted a v2 packfile section, but info_refs
advertises v0, so real clients negotiated v0 and rejected the response
with 'expected ACK/NAK, got packfile'. Frame the v0 stateless-rpc shape
instead (NAK, then the pack via side-band-64k when offered).

Add an end-to-end test that stands up info_refs + upload_pack_excluding
and runs a real git partial clone, asserting the withheld blob's bytes
never reach the client while its tree entry and SHA stay visible. A stock
full clone cannot consume the pack (it is not closed under reachability,
so fetch fails the connectivity check); a partial clone is required.
…tion choice

Add a real-git test that partial-clones, pushes a new commit server-side,
then fetches: the new object arrives and the withheld blob stays absent.
This pins down that ignoring have/want negotiation (always sending a
self-contained pack of all refs minus withheld, with NAK) is correct for
both clone and fetch; the only cost is a fetch re-sends the full object
set. Refactor the real-git tests onto a shared server harness and document
the negotiation decision in code and in the plan's follow-ups.
Move the two blocking git shell-outs in the filtered upload-pack path off
the async worker thread, matching the tokio::process / spawn_blocking usage
already in this file: build_filtered_pack (rev-list + pack-objects) and
withheld_blob_oids (per-ref ls-tree) now run inside spawn_blocking so a large
repo cannot stall the tokio runtime. Behavior is unchanged.

Also fix the Task 0 findings block in the Phase 3 plan: it still recorded v2
packfile framing, which is the exact path that failed against a real client
and was corrected to v0. The block now documents the shipped v0 contract.
Drop a stray trailing code fence flagged by markdownlint (MD040).

The speculative ls-tree timeout and the public/no-rules fast-path from the
review are intentionally left out: the timeout guards against adversarial
repos we do not yet host, and the fast-path is a micro-optimization not worth
the extra branch right now.
kevincodex1 asked to keep the superpowers planning docs out of the repo. The
Phase 3 plan was scaffolding for this change, not something the project needs
to carry. Removing it leaves only the code and tests in the PR.
Three fixes from the PR Gitlawb#33 review:

- withheld_paths now applies the whole-repo "/" read gate (returns
  repo-not-found when the caller cannot read the root), matching the git read
  endpoints. Without it the endpoint disclosed a private repo's existence and
  path layout to unauthorized callers. The withheld_globs doc already assumed
  this gate existed; now it does.

- A nested allow under a denied parent (e.g. "/secret/public/**" allowed,
  "/secret/**" denied) was over-withheld: the client sparse-excluded the whole
  parent and hid paths the caller may read. The endpoint now also returns a
  "reinclude" list (allowed globs strictly under a denied one) and gl clone
  re-includes them in the sparse spec after the excludes.

- Wildcard-free globs like "/docs/private" match both the exact path and a
  subtree (per glob_matches), but the client only emitted the subtree exclude.
  sparse_patterns now emits both "/docs/private" and "/docs/private/".

Verified the exclude-then-reinclude sparse ordering checks out cleanly with
real git, plus unit tests for reincluded_globs, the nested re-include, the
exact-path exclude, and sparse_patterns.
…eld-path errors

split_once('/') accepted owner/name/extra, smuggling a path segment into
the repo name that then flowed into the API path and clone URL; reject it.

fetch_withheld swallowed every network/auth/5xx/JSON error into an empty
result, dropping to a stock clone that the node refuses once blobs are
withheld. Now only 404/501 (endpoint unsupported) fall back to empty; the
rest propagate so the real cause surfaces.
The receive-pack replication chokepoint called withheld_blob_oids
directly on the tokio worker, where its blocking git ls-tree walk can
stall the runtime for repos with many refs. Wrap it in spawn_blocking
to match the upload-pack serve path.
Add a regression test for deny /secret, allow /secret/public, deny
/secret/public/admin and clarify the sparse-checkout comment. git does
not re-traverse an explicitly excluded directory, so emitting all
excludes before re-includes keeps the deepest deny in force; the broader
parent re-include does not resurrect it.
…hema

Read the default branch from the local origin/HEAD symref clone already
set, instead of parsing the localized, network-dependent output of
git remote show origin. Deserialize the withheld-paths body into a typed
struct so a missing or mistyped withheld/reinclude field is a hard error
rather than silently becoming an empty list, which would mask a server
regression behind a later clone failure.
Give the sync worker's HTTP client a 30s timeout so a stalled peer
withheld-paths lookup cannot hang the worker loop. When a repo that was
mirrored as a promisor (mode B) becomes fully public, fetch_repo now
clears remote.origin.promisor and partialclonefilter and refetches, so
the once-withheld blobs are backfilled instead of the mirror staying
permanently partial.
At the push chokepoint, after pinning withheld objects, resolve each
withheld blob's recipient DIDs and seal it to their Ed25519 keys with
seal_blob, pinning the ciphertext to IPFS and recording it in
encrypted_blobs. Best-effort per blob: failures are logged and skipped,
never pinned in plaintext. Pinata replication is unchanged; B1
encrypts to IPFS only.

Adds ed25519-dalek as a direct dependency of gitlawb-node (it was only
declared in the workspace Cargo.toml).
Skip re-pinning only when an existing envelope already covers exactly the
current recipients. A reader added to a rule after the first pin now gets
a re-seal on the next push instead of being permanently locked out.
Reader removal stays non-retroactive (the old envelope is already public).
…or present-check

Add two hermetic integration tests for recover_from_arweave that drive the
full read path over mocked Arweave GraphQL + IPFS gateways: discover the
manifest, fetch it, fetch the envelope, decrypt, and install the withheld
blob. One covers an authorized recipient (blob installed), the other a
non-recipient (nothing recovered). Both simulate origin death by removing
the promisor remote and enable uploadpack.allowFilter so the blob is truly
withheld over file://.

Also harden the local presence check in recover_from_arweave with
GIT_NO_LAZY_FETCH=1 and .output() so the expected 'missing object' case does
not trigger a wasted promisor fetch or leak git stderr to the user.
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6c3c792b-a9ec-4734-bd66-8167547bfc8b

📥 Commits

Reviewing files that changed from the base of the PR and between 2131b0c and a939a65.

📒 Files selected for processing (2)
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gl/src/clone.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gl/src/clone.rs

📝 Walkthrough

Walkthrough

Adds envelope encryption for withheld blobs, visibility-driven blob withholding and filtered upload-pack serving, DB persistence and encrypted-pin/replication, Arweave manifest anchoring, sync worker promisor-aware mirroring, and a client gl clone with partial-clone recovery paths.

Changes

Encrypted Blob End-to-End Support

Layer / File(s) Summary
Core encryption and identity infrastructure
crates/gitlawb-core/Cargo.toml, crates/gitlawb-core/src/encrypt.rs, crates/gitlawb-core/src/identity.rs, crates/gitlawb-core/src/lib.rs
XChaCha20-Poly1305 envelope encryption with per-recipient X25519-wrapped content keys derived from Ed25519 identity keys; Keypair::seed_bytes() added and encrypt module exported.
Visibility-aware blob withholding and filtering
crates/gitlawb-node/src/visibility.rs, crates/gitlawb-node/src/api/visibility.rs
Computes withheld and reinclude globs for clean partial clones and exposes them via a gated withheld-paths endpoint.
Visibility-aware git pack withholding
crates/gitlawb-node/src/git/visibility_pack.rs, crates/gitlawb-node/src/git/smart_http.rs, crates/gitlawb-node/src/api/repos.rs
Enumerates blob OIDs/paths, determines withheld OIDs and recipients, builds filtered pack omitting withheld blobs, and routes upload-pack to filtered or normal pack; comprehensive protocol and e2e tests.
Database schema and encrypted blob queries
crates/gitlawb-node/src/db/mod.rs
Adds migration v4 encrypted_blobs table and DB methods to record, list (caller-scoped and repo-wide), lookup CID conditionally by recipient, and fetch recipients.
Encrypt-and-pin withheld blobs
crates/gitlawb-node/src/encrypted_pin.rs, crates/gitlawb-node/src/main.rs
Resolves recipient DIDs to keys, seals blobs with seal_blob, pins envelopes to IPFS, upserts encrypted-blob rows in DB, and returns recorded (oid,cid,recipients) for successful items.
IPFS/Pinata pinning integration
crates/gitlawb-node/src/ipfs_pin.rs, crates/gitlawb-node/src/pinata.rs
Adds ipfs_pin::cat helper and updates bulk pinning (pin_new_objects) to accept a withheld set and filter candidate objects via replicable_objects before pinning.
Server API endpoints for encrypted blob access
crates/gitlawb-node/src/api/encrypted.rs, crates/gitlawb-node/src/api/mod.rs, crates/gitlawb-node/src/server.rs
Adds list_encrypted_blobs (caller-scoped), get_encrypted_blob (fetches envelope bytes from IPFS), and replicate_encrypted_blobs (repo-wide metadata) and wires routes into router.
Push-phase announce gating & Arweave manifest anchoring
crates/gitlawb-node/src/api/repos.rs, crates/gitlawb-node/src/arweave.rs
Reworks receive-pack to compute announce once per push and gate pinning/encryption/Arweave anchoring; adds EncryptedManifest and anchor_encrypted_manifest to post per-push manifests to Irys.
Sync worker promisor mirror and replication
crates/gitlawb-node/src/sync.rs
Classifies mirrors (Promisor vs Plain) via withheld-paths, performs mode-aware clone/fetch with promisor config handling, and replicates encrypted envelopes from origin by fetching/pinning envelopes and recording DB rows.
gl clone command with partial clone and recovery
crates/gl/src/clone.rs, crates/gl/src/main.rs
Adds gl clone with promisor partial clone and sparse-checkout patterns, then attempts node-based encrypted-blob recovery (and Arweave/IPFS fallback) to restore missing blobs into the checkout.
Local ignore rule
.gitignore
Ignores local docs/superpowers/ planning/scratch files.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Gitlawb/node#25: Related prior work extending path-scoped visibility enforcement used by withheld-paths and related handlers.

Suggested reviewers

  • kevincodex1

Poem

🐰 I hid the blobs beneath a spell,

X25519 guards the secret well,
Promisor clones skip what’s sealed,
But rightful keys restore the field,
A bunny hums — the repo sleeps, all’s well.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main feature: encrypted replication for private subtrees across three implementation options (B1/B2/B3), with a reference to the associated issue #18.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
crates/gitlawb-core/src/identity.rs (1)

55-59: 💤 Low value

Consider using Zeroizing for consistency with to_seed.

The new seed_bytes returns the raw seed without Zeroizing, while the existing to_seed wraps it. Since the seed is sensitive key material, consider returning Zeroizing<[u8; 32]> here as well for consistent memory hygiene. The caller in encrypt.rs can dereference it.

♻️ Suggested change
-    pub fn seed_bytes(&self) -> [u8; 32] {
-        self.signing_key.to_bytes()
+    pub fn seed_bytes(&self) -> Zeroizing<[u8; 32]> {
+        Zeroizing::new(self.signing_key.to_bytes())
     }

Then in encrypt.rs:

-    let my_x = XSecret::from(x25519_secret_from_seed(&keypair.seed_bytes()));
+    let my_x = XSecret::from(x25519_secret_from_seed(&keypair.seed_bytes()));

(No change needed in encrypt.rs since Zeroizing<[u8; 32]> derefs to [u8; 32].)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-core/src/identity.rs` around lines 55 - 59, seed_bytes
currently returns a plain [u8; 32] exposing sensitive seed material; change its
signature to pub fn seed_bytes(&self) -> Zeroizing<[u8; 32]> and return
Zeroizing::new(self.signing_key.to_bytes()) and add use zeroize::Zeroizing at
the top; keep to_seed as-is and note callers (e.g., encrypt.rs) need no changes
because Zeroizing<[u8; 32]> derefs to [u8; 32].
crates/gitlawb-node/src/encrypted_pin.rs (1)

49-63: 💤 Low value

Consider logging warnings for silent failures.

The read_object and pin_git_object failures silently continue without logging, unlike seal_blob and record_encrypted_blob failures which log warnings. For operational visibility, consider adding warnings here as well.

♻️ Suggested change
         let data = match crate::git::store::read_object(repo_path, oid) {
             Ok(Some((_t, bytes))) => bytes,
-            _ => continue,
+            Ok(None) => {
+                tracing::warn!(oid = %oid, "blob not found in git store; skipping");
+                continue;
+            }
+            Err(e) => {
+                tracing::warn!(oid = %oid, err = %e, "read_object failed; skipping");
+                continue;
+            }
         };
         // ... seal_blob ...
         let cid = match crate::ipfs_pin::pin_git_object(ipfs_api, oid, &envelope).await {
             Ok(c) if !c.is_empty() => c,
-            _ => continue,
+            Ok(_) => {
+                tracing::warn!(oid = %oid, "IPFS pin returned empty CID; skipping");
+                continue;
+            }
+            Err(e) => {
+                tracing::warn!(oid = %oid, err = %e, "IPFS pin failed; skipping");
+                continue;
+            }
         };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/encrypted_pin.rs` around lines 49 - 63, Add warning
logs for the branches that currently silently continue so operational failures
are visible: when crate::git::store::read_object(repo_path, oid) returns Err or
Ok(None), emit a tracing::warn! including oid and repo_path before continuing;
likewise, when crate::ipfs_pin::pin_git_object(ipfs_api, oid, &envelope).await
returns Err or an empty cid, emit a tracing::warn! including oid and ipfs_api
(or identifying info) and the result before continuing; keep the existing
tracing style used for seal_blob and record_encrypted_blob to stay consistent.
crates/gl/src/clone.rs (2)

241-336: ⚡ Quick win

Silent failure on encrypted-blobs list fetch hinders troubleshooting.

Lines 255-261 return an empty Vec when the /encrypted-blobs request fails, without logging the failure reason. Users won't know whether recovery was skipped because:

  1. No encrypted blobs exist (legitimate),
  2. Network error reaching the node,
  3. Authorization failure, or
  4. Node internal error.

Adding a debug or info log on non-success responses would help users diagnose recovery issues without changing the best-effort semantics.

♻️ Suggested improvement: log fetch failures
     let resp = match client
         .get_signed(&format!("/api/v1/repos/{owner}/{name}/encrypted-blobs"))
         .await
     {
         Ok(r) if r.status().is_success() => r,
-        _ => return Ok(vec![]),
+        Ok(r) => {
+            tracing::debug!(status = %r.status(), "encrypted-blobs fetch failed; skipping node-based recovery");
+            return Ok(vec![]);
+        }
+        Err(e) => {
+            tracing::debug!(err = %e, "encrypted-blobs request error; skipping node-based recovery");
+            return Ok(vec![]);
+        }
     };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gl/src/clone.rs` around lines 241 - 336, The initial fetch of
"/api/v1/repos/{owner}/{name}/encrypted-blobs" in recover_encrypted_blobs
silently returns Ok(vec![]) on any non-success result; update the match to log
the failure details (status code and/or response body/error) before returning so
operators can distinguish "no blobs" from network/auth/node errors — capture the
Err or non-success branch from client.get_signed(...) (and the response when
present) and emit a processLogger/info/debug or eprintln! with context including
node, owner, name and the response status/body or error, then continue returning
Ok(vec![]) to preserve best-effort behavior.

204-225: 💤 Low value

404 response ambiguity: endpoint-not-implemented vs repo-not-found.

Line 216 treats HTTP 404 and 501 as "no withheld paths" for backward compatibility with older nodes that don't implement the endpoint. However, a 404 could also mean the repository {owner}/{name} doesn't exist.

When the repo is missing, the function returns Ok((Vec::new(), Vec::new())) and proceeds to clone, only to fail later at the git fetch step with a less clear error. This indirection makes typos in repo names harder to diagnose.

Consider logging a warning when receiving 404 to help users distinguish between "old node" and "typo in repo name."

♻️ Optional improvement: log 404 for easier debugging
     let resp = match resp {
         Ok(r) if r.status().is_success() => r,
-        Ok(r) if matches!(r.status().as_u16(), 404 | 501) => return Ok((Vec::new(), Vec::new())),
+        Ok(r) if matches!(r.status().as_u16(), 404 | 501) => {
+            tracing::debug!(
+                status = r.status().as_u16(),
+                "withheld-paths endpoint returned {}: assuming no withheld paths (old node or repo not found)",
+                r.status()
+            );
+            return Ok((Vec::new(), Vec::new()));
+        }
         Ok(r) => bail!("withheld-paths lookup failed: {}", r.status()),
         Err(err) => return Err(err).context("fetching withheld paths"),
     };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gl/src/clone.rs` around lines 204 - 225, In fetch_withheld, when
treating HTTP 404/501 as "no withheld paths" broadened for backward
compatibility, add a warning log specifically for 404 responses so
repo-not-found vs unimplemented endpoint is visible; update the match that
currently returns Ok((Vec::new(), Vec::new())) for status 404 to call the logger
(e.g., process_logger.warn or crate logger) with context including node, owner,
name and the 404 status before returning empty vectors, while keeping 501 silent
for compatibility; reference the fetch_withheld function and the
NodeClient.get/NodeClient.get_signed response handling to locate where to insert
the warning.
crates/gitlawb-node/src/api/encrypted.rs (1)

12-32: 💤 Low value

Consider returning 401 for unauthenticated callers instead of empty list.

Line 17 defaults to "" when no authentication is present. Unauthenticated callers will receive {"blobs": []} (since no blob is authorized to DID ""), which is indistinguishable from "authenticated but authorized for zero blobs."

Returning a 401 Unauthorized for missing auth would make the security boundary more explicit and help clients distinguish authentication failures from empty result sets.

♻️ Optional improvement: explicit auth requirement
 pub async fn list_encrypted_blobs(
     State(state): State<AppState>,
     auth: Option<Extension<AuthenticatedDid>>,
     Path((owner, repo)): Path<(String, String)>,
 ) -> Result<Json<serde_json::Value>> {
-    let caller = auth.as_ref().map(|e| e.0 .0.as_str()).unwrap_or("");
+    let caller = auth
+        .as_ref()
+        .map(|e| e.0 .0.as_str())
+        .ok_or_else(|| AppError::Unauthorized("authentication required".into()))?;
     let record = state
         .db
         .get_repo(&owner, &repo)
         .await?
         .ok_or_else(|| AppError::RepoNotFound(format!("{owner}/{repo}")))?;
     let rows = state
         .db
         .list_encrypted_blobs_for(&record.id, caller)
         .await?;
     let blobs: Vec<_> = rows
         .into_iter()
         .map(|(oid, cid)| serde_json::json!({ "oid": oid, "cid": cid }))
         .collect();
     Ok(Json(serde_json::json!({ "blobs": blobs })))
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/encrypted.rs` around lines 12 - 32, The handler
list_encrypted_blobs currently treats missing auth as caller = "" causing
unauthenticated requests to get an empty blob list; instead detect when auth is
None and return a 401 Unauthorized error. Update list_encrypted_blobs: replace
the caller defaulting logic with an early check on auth (the
Extension<AuthenticatedDid> argument) and return
Err(AppError::Unauthorized(...)) or the project’s equivalent unauthorized error
when auth.is_none(); keep the rest of the function (db lookups and blob mapping)
unchanged so only authenticated requests proceed.
crates/gitlawb-node/src/ipfs_pin.rs (1)

76-86: ⚡ Quick win

IPFS error response body is discarded, hindering diagnostics.

Line 83 constructs the error message using only the HTTP status code. IPFS often returns helpful error details in the response body (e.g., "block not found", "timeout reading from datastore") that would aid debugging when recovery fails.

♻️ Suggested improvement: include response body in error
 pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> {
     if ipfs_api.is_empty() {
         return Err(anyhow::anyhow!("IPFS not configured"));
     }
     let url = format!("{}/api/v0/cat?arg={}", ipfs_api.trim_end_matches('/'), cid);
     let resp = reqwest::Client::new().post(&url).send().await?;
     if !resp.status().is_success() {
-        return Err(anyhow::anyhow!("ipfs cat {cid}: {}", resp.status()));
+        let status = resp.status();
+        let body = resp.text().await.unwrap_or_default();
+        return Err(anyhow::anyhow!("ipfs cat {cid}: {status}: {body}"));
     }
     Ok(resp.bytes().await?.to_vec())
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/ipfs_pin.rs` around lines 76 - 86, The cat function
currently returns an error using only resp.status(); update the failure path in
fn cat(ipfs_pin::cat / cat) to read the response body (e.g., let body =
resp.text().await.unwrap_or_default()) when resp.status().is_success() is false
and include that body (or a truncated version) in the anyhow::anyhow! error
message so IPFS error details (like "block not found") are preserved; ensure you
await resp.text() only on the error branch (so you don't consume the body
prematurely) and handle any text read errors gracefully (fallback to empty or
placeholder) before constructing the final error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/encrypted.rs`:
- Around line 36-56: The handler get_encrypted_blob currently maps a None from
state.db.encrypted_blob_cid(...) to AppError::RepoNotFound, which is incorrect;
change that error to a more specific variant such as
AppError::BlobNotFound(format!("{owner}/{repo}/{oid}")) or
AppError::AccessDenied when you can detect the caller is unauthorized. Locate
the call in get_encrypted_blob and replace the ok_or_else(||
AppError::RepoNotFound(...)) with ok_or_else(|| AppError::BlobNotFound(...)) or,
if you can distinguish authorization from missing blob within encrypted_blob_cid
(or by adding a helper), return AppError::AccessDenied for unauthorized callers
and AppError::BlobNotFound for genuinely missing OIDs. Ensure you update any
error variant names you use to match existing AppError variants (or add them to
AppError if missing).

In `@crates/gitlawb-node/src/db/mod.rs`:
- Around line 1663-1684: Replace the current silent JSON fallback for recipients
(serde_json::from_str(&recipients).unwrap_or_default()) with explicit
deserialization error handling: in list_encrypted_blobs_for,
list_all_encrypted_blobs, encrypted_blob_cid, and encrypted_blob_recipients,
call serde_json::from_str(&recipients) and on Err log a clear corruption message
(including repo_id and oid/cid where available) and return a propagated error
instead of an empty Vec so callers see the failure; use map_err / context to
convert the serde_json error into the function's Result error type and ensure
the log distinguishes malformed recipients JSON from other DB errors.

In `@crates/gl/src/clone.rs`:
- Around line 563-576: The code in clone.rs silently drops I/O and git errors
when updating the sparse-checkout and running checkout (the read_to_string call,
write to .git/info/sparse-checkout, and git(&dest, &["checkout", "--", "."])
invocations), which can leave recovered files unmaterialized; update this block
to handle the Results instead of using let _ = — e.g., check and log (or return)
errors from std::fs::read_to_string(&spec), std::fs::write(&spec, s), and the
git(...) call, and emit a clear warning mentioning the affected paths/spec so
users know sparse-checkout or checkout failed and recovered files may be missing
(use the local variables spec and paths to construct the message).

---

Nitpick comments:
In `@crates/gitlawb-core/src/identity.rs`:
- Around line 55-59: seed_bytes currently returns a plain [u8; 32] exposing
sensitive seed material; change its signature to pub fn seed_bytes(&self) ->
Zeroizing<[u8; 32]> and return Zeroizing::new(self.signing_key.to_bytes()) and
add use zeroize::Zeroizing at the top; keep to_seed as-is and note callers
(e.g., encrypt.rs) need no changes because Zeroizing<[u8; 32]> derefs to [u8;
32].

In `@crates/gitlawb-node/src/api/encrypted.rs`:
- Around line 12-32: The handler list_encrypted_blobs currently treats missing
auth as caller = "" causing unauthenticated requests to get an empty blob list;
instead detect when auth is None and return a 401 Unauthorized error. Update
list_encrypted_blobs: replace the caller defaulting logic with an early check on
auth (the Extension<AuthenticatedDid> argument) and return
Err(AppError::Unauthorized(...)) or the project’s equivalent unauthorized error
when auth.is_none(); keep the rest of the function (db lookups and blob mapping)
unchanged so only authenticated requests proceed.

In `@crates/gitlawb-node/src/encrypted_pin.rs`:
- Around line 49-63: Add warning logs for the branches that currently silently
continue so operational failures are visible: when
crate::git::store::read_object(repo_path, oid) returns Err or Ok(None), emit a
tracing::warn! including oid and repo_path before continuing; likewise, when
crate::ipfs_pin::pin_git_object(ipfs_api, oid, &envelope).await returns Err or
an empty cid, emit a tracing::warn! including oid and ipfs_api (or identifying
info) and the result before continuing; keep the existing tracing style used for
seal_blob and record_encrypted_blob to stay consistent.

In `@crates/gitlawb-node/src/ipfs_pin.rs`:
- Around line 76-86: The cat function currently returns an error using only
resp.status(); update the failure path in fn cat(ipfs_pin::cat / cat) to read
the response body (e.g., let body = resp.text().await.unwrap_or_default()) when
resp.status().is_success() is false and include that body (or a truncated
version) in the anyhow::anyhow! error message so IPFS error details (like "block
not found") are preserved; ensure you await resp.text() only on the error branch
(so you don't consume the body prematurely) and handle any text read errors
gracefully (fallback to empty or placeholder) before constructing the final
error.

In `@crates/gl/src/clone.rs`:
- Around line 241-336: The initial fetch of
"/api/v1/repos/{owner}/{name}/encrypted-blobs" in recover_encrypted_blobs
silently returns Ok(vec![]) on any non-success result; update the match to log
the failure details (status code and/or response body/error) before returning so
operators can distinguish "no blobs" from network/auth/node errors — capture the
Err or non-success branch from client.get_signed(...) (and the response when
present) and emit a processLogger/info/debug or eprintln! with context including
node, owner, name and the response status/body or error, then continue returning
Ok(vec![]) to preserve best-effort behavior.
- Around line 204-225: In fetch_withheld, when treating HTTP 404/501 as "no
withheld paths" broadened for backward compatibility, add a warning log
specifically for 404 responses so repo-not-found vs unimplemented endpoint is
visible; update the match that currently returns Ok((Vec::new(), Vec::new()))
for status 404 to call the logger (e.g., process_logger.warn or crate logger)
with context including node, owner, name and the 404 status before returning
empty vectors, while keeping 501 silent for compatibility; reference the
fetch_withheld function and the NodeClient.get/NodeClient.get_signed response
handling to locate where to insert the warning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b291abdf-f2b1-478d-aa85-b571c76c9cb1

📥 Commits

Reviewing files that changed from the base of the PR and between 6abaf1d and 2131b0c.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (24)
  • .gitignore
  • crates/gitlawb-core/Cargo.toml
  • crates/gitlawb-core/src/encrypt.rs
  • crates/gitlawb-core/src/identity.rs
  • crates/gitlawb-core/src/lib.rs
  • crates/gitlawb-node/Cargo.toml
  • crates/gitlawb-node/src/api/encrypted.rs
  • crates/gitlawb-node/src/api/mod.rs
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/api/visibility.rs
  • crates/gitlawb-node/src/arweave.rs
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gitlawb-node/src/encrypted_pin.rs
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/git/smart_http.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
  • crates/gitlawb-node/src/ipfs_pin.rs
  • crates/gitlawb-node/src/main.rs
  • crates/gitlawb-node/src/pinata.rs
  • crates/gitlawb-node/src/server.rs
  • crates/gitlawb-node/src/sync.rs
  • crates/gitlawb-node/src/visibility.rs
  • crates/gl/src/clone.rs
  • crates/gl/src/main.rs

Comment thread crates/gitlawb-node/src/api/encrypted.rs
Comment thread crates/gitlawb-node/src/db/mod.rs
Comment thread crates/gl/src/clone.rs
db: parse_recipients now propagates a descriptive error instead of
defaulting corrupt recipients JSON to an empty list, which would have
denied authorized readers and handed peers incomplete metadata.

gl: clone recovery now warns when the sparse-checkout file cannot be read
or written, or when the post-recovery checkout fails, instead of silently
discarding those errors and claiming files were recovered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant