Skip to content

feat(pm): add native-git BFS resolution (2/3)#2623

Merged
elrrrrrrr merged 7 commits intonextfrom
pr2/bfs-git-resolution
Mar 5, 2026
Merged

feat(pm): add native-git BFS resolution (2/3)#2623
elrrrrrrr merged 7 commits intonextfrom
pr2/bfs-git-resolution

Conversation

@killagu
Copy link
Copy Markdown
Contributor

@killagu killagu commented Feb 25, 2026

Summary

Split from #2603 — Part 2 of 3 for git dependency resolution. Depends on #2622.

  • Add resolver/git.rs with full gix clone implementation + BFS resolver
  • Add GIT_CLONE_CACHE with tokio::sync::OnceCell for concurrent clone dedup
  • Add native-git feature flag (gix + tempfile optional deps)
  • Route non-registry specs through git resolver in BFS process_dependency
  • Add BuildDepsConfig.cache_dir with builder pattern

Key implementation details

  • clone_repo (async) wraps clone_repo_blocking via spawn_blocking
  • resolve_non_registry_dep, build_manifest_from_git_cache live in resolver/ (not traits/) per module boundary convention
  • build_manifest_from_git_cache uses tokio::fs::read_to_string (non-blocking I/O)
  • BuildDepsOptions::new() preserved — no public API removal
  • Step comments in service/api.rs unchanged

Cache layout

Git packages: <cache_dir>/<name>/<commit_sha>/ — same <name>/<key>/ layout as registry tarballs, so utoo clean works uniformly.

Test plan

  • cargo check -p utoo-ruborist passes (no feature)
  • cargo check -p utoo-ruborist --features native-git passes
  • cargo test -p utoo-ruborist — all tests pass
  • Verify OnceMap dedup under concurrent BFS load

Part of git dependency resolution: #2622PR2#2603-pr3

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings February 25, 2026 02:22
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @killagu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances ruborist's dependency resolution capabilities by introducing native Git support. It enables the resolver to fetch and process packages directly from Git repositories, including GitHub, by leveraging the gix library. The changes include a new feature flag to make this functionality optional, a dedicated module for Git operations with built-in caching for efficiency, and updates to the core dependency processing logic to seamlessly integrate Git-based packages alongside registry packages. This is a crucial step in expanding ruborist's ability to handle diverse package sources.

Highlights

  • Native Git Resolution: Introduced a native-git feature flag to enable Git dependency resolution using the gix library.
  • Git Resolver Module: Added a new resolver/git.rs module containing the core logic for cloning Git repositories and resolving Git-based package specifications.
  • Concurrent Clone Deduplication: Implemented GIT_CLONE_CACHE using tokio::sync::OnceCell to prevent redundant concurrent Git clone operations for the same repository.
  • Configurable Git Cache Directory: Extended BuildDepsConfig with a cache_dir option, allowing users to specify the location for caching Git repositories.
  • BFS Integration: Modified the Breadth-First Search (BFS) process_dependency logic to route non-registry package specifications (e.g., git+https://, github:) through the new Git resolver.
Changelog
  • crates/ruborist/Cargo.toml
    • Added gix and tempfile as optional dependencies.
    • Defined a native-git feature that enables gix and tempfile.
  • crates/ruborist/src/lib.rs
    • Exported clone_repo from the new resolver::git module, conditional on the native-git feature.
  • crates/ruborist/src/resolver/builder.rs
    • Introduced conditional compilation for resolve_non_registry_dep, providing a real implementation when native-git is enabled and a stub otherwise.
    • Added cache_dir: Option<PathBuf> to BuildDepsConfig and its Default implementation.
    • Implemented with_cache_dir builder method for BuildDepsConfig.
    • Updated gather_preload_deps to use is_non_registry_spec for identifying non-registry dependencies.
    • Refactored process_dependency to accept &BuildDepsConfig instead of a boolean legacy_peer_deps.
    • Modified process_dependency to check if a spec is non-registry and, if so, route it through resolve_non_registry_dep.
    • Adjusted calls to add_edges_from and run_bfs_phase to pass the BuildDepsConfig or its relevant fields.
  • crates/ruborist/src/resolver/git.rs
    • Added a new module containing the core logic for Git cloning and resolution.
    • Implemented GIT_CLONE_CACHE for deduplicating concurrent clone operations.
    • Provided helper functions for GitHub authentication token injection, package.json parsing from Git trees, and recursive tree extraction.
    • Defined clone_repo_blocking for the actual gix-based cloning and caching.
    • Exposed clone_repo as an async public API for Git cloning.
    • Implemented resolve_non_registry_dep to parse Git specs, clone repositories, and build VersionManifest from cached package.json.
    • Included build_manifest_from_git_cache for creating manifests from cached Git packages.
  • crates/ruborist/src/resolver/mod.rs
    • Added pub mod git; conditionally compiled with the native-git feature.
  • crates/ruborist/src/resolver/registry.rs
    • Updated the ResolveError::Git enum variant to store both url and message for more detailed error reporting.
  • crates/ruborist/src/service/api.rs
    • Modified the build_deps_with_config function to pass the cache_dir to the BuildDepsConfig.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces native git dependency resolution using the gix crate, gated behind a native-git feature flag. The implementation includes a new resolver/git.rs module with logic for cloning, caching, and resolving git dependencies. The changes are well-structured, with a clear separation of concerns between blocking git operations and the async resolver logic. The use of tokio::sync::OnceCell for deduplicating concurrent clones is a good approach. My feedback focuses on improving the robustness of error handling in the new git resolver by propagating errors instead of silently ignoring them, and refining error messages for clarity.

Comment thread crates/ruborist/src/resolver/git.rs Outdated
if entry.mode().kind() == gix::object::tree::EntryKind::BlobExecutable {
use std::os::unix::fs::PermissionsExt;
let perms = std::fs::Permissions::from_mode(0o755);
std::fs::set_permissions(&entry_path, perms).ok();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silently ignoring the error from set_permissions with .ok() could lead to unexpected behavior if the executable bit is not set correctly on systems where it matters. It's better to propagate the error to make the cloning process more robust and debuggable.

Suggested change
std::fs::set_permissions(&entry_path, perms).ok();
std::fs::set_permissions(&entry_path, perms)?;

Comment thread crates/ruborist/src/resolver/git.rs Outdated
let obj = repo.find_object(entry.object_id())?;
let target = std::str::from_utf8(&obj.data).context("non-UTF-8 symlink target")?;
#[cfg(unix)]
std::os::unix::fs::symlink(target, &entry_path).ok();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silently ignoring the error from symlink with .ok() could hide issues like lack of filesystem support or permission errors. It's better to propagate the error to ensure the cached package is a faithful representation of the git tree.

Suggested change
std::os::unix::fs::symlink(target, &entry_path).ok();
std::os::unix::fs::symlink(target, &entry_path)?;

Comment thread crates/ruborist/src/resolver/git.rs Outdated

// Extract full tree to cache
if package_dir.exists() {
std::fs::remove_dir_all(&package_dir).ok();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silently ignoring the error from remove_dir_all with .ok() could hide underlying problems, such as file permission issues in the cache directory. This might lead to subsequent, harder-to-debug failures during extraction. It would be better to propagate the error.

Suggested change
std::fs::remove_dir_all(&package_dir).ok();
std::fs::remove_dir_all(&package_dir)?;

Comment thread crates/ruborist/src/resolver/git.rs Outdated
)
})
.await
.context("git resolver task panicked")?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message "git resolver task panicked" is not entirely accurate. A JoinError from tokio::task::spawn_blocking can occur if the task is cancelled, not just when it panics. A more general message would be more appropriate.

Suggested change
.context("git resolver task panicked")?
.context("git resolver task failed")?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a native-git-gated, gix-powered git clone backend and wires non-registry dependency specs into the BFS resolver, using a shared clone-dedup cache and a configurable cache directory to persist git checkouts alongside registry tarball cache layout.

Changes:

  • Introduce resolver/git.rs (clone + cache + git/GitHub spec resolution) behind the native-git feature flag.
  • Route non-registry specs through the git resolver during BFS dependency processing, and add BuildDepsConfig.cache_dir.
  • Add optional gix/tempfile dependencies and re-export clone_repo (feature-gated) from the crate.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
crates/ruborist/src/service/api.rs Passes cache_dir into BuildDepsConfig for git clone caching.
crates/ruborist/src/resolver/registry.rs Refines ResolveError::Git to include { url, message } for better error reporting.
crates/ruborist/src/resolver/mod.rs Adds the new git resolver module under cfg(feature = "native-git").
crates/ruborist/src/resolver/git.rs Implements gix-based cloning, cache extraction, dedup via OnceCell, and manifest building from cached package.json.
crates/ruborist/src/resolver/builder.rs Adds cache_dir to config and routes non-registry specs through git resolution in BFS.
crates/ruborist/src/lib.rs Re-exports clone_repo under native-git.
crates/ruborist/Cargo.toml Adds native-git feature and optional deps (gix, tempfile).

Comment thread crates/ruborist/src/resolver/git.rs Outdated

// Extract full tree to cache
if package_dir.exists() {
std::fs::remove_dir_all(&package_dir).ok();
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an existing package_dir is present, remove_dir_all errors are ignored (.ok()). If removal fails (permissions, locks, partial state), the subsequent extraction can leave a corrupted cache directory. Consider propagating the error (or at least logging and bailing) so callers don't proceed with a potentially inconsistent cache entry.

Suggested change
std::fs::remove_dir_all(&package_dir).ok();
std::fs::remove_dir_all(&package_dir).with_context(|| {
format!(
"failed to remove existing git package cache directory: {}",
package_dir.display()
)
})?;

Copilot uses AI. Check for mistakes.
Comment thread crates/ruborist/src/resolver/git.rs Outdated
.await?;

Ok(ResolvedPackage {
name: dep_name.to_string(),
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolvedPackage.name is set to dep_name, whereas registry resolution returns the package's canonical name from the manifest. For consistency (and to avoid surprising downstream consumers of ResolvedPackage), consider setting ResolvedPackage.name to result.name (the package.json name) and treating dep_name purely as the edge/install name handled by the graph.

Suggested change
name: dep_name.to_string(),
name: result.name,

Copilot uses AI. Check for mistakes.
Comment thread crates/ruborist/src/resolver/builder.rs Outdated
Comment on lines +345 to +349
// Route non-registry specs (git, github, …) through the
// built-in git resolver; everything else goes to the registry.
let resolved = if is_non_registry_spec(&edge_info.spec) {
match resolve_non_registry_dep(&config.cache_dir, &edge_info.name, &edge_info.spec)
.await
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_non_registry_spec() includes local specs (file:/link:/workspace:/portal:). Routing all of these through resolve_non_registry_dep will currently bail as "Unsupported non-registry spec" and then be surfaced as ResolveError::Git, breaking valid workspace/local dependency specs. Consider narrowing the routing condition to only git-like specs (git+, git://, github:) or parsing with PackageSpec::parse() and only invoking the git resolver for PackageSpec::Git|GitHub; local specs should be handled separately (e.g., resolved to workspace/link nodes or skipped).

Copilot uses AI. Check for mistakes.
Comment thread crates/ruborist/src/resolver/builder.rs Outdated
Comment on lines +345 to +366
// Route non-registry specs (git, github, …) through the
// built-in git resolver; everything else goes to the registry.
let resolved = if is_non_registry_spec(&edge_info.spec) {
match resolve_non_registry_dep(&config.cache_dir, &edge_info.name, &edge_info.spec)
.await
{
Ok(r) => r,
Err(_) if edge_info.edge_type == EdgeType::Optional => {
tracing::debug!(
"Skipped optional non-registry dependency {}@{}",
edge_info.name,
edge_info.spec
);
return Ok(ProcessResult::Skipped);
}
Err(e) => {
return Err(ResolveError::Git {
url: edge_info.spec.clone(),
message: e.to_string(),
})
}
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing async tests in this file, but none cover the new non-registry routing logic in process_dependency (git spec -> git resolver, optional non-registry failures -> skipped, non-optional failures -> ResolveError::Git). Adding targeted tests here would help prevent regressions (including the stubbed behavior when native-git is disabled).

Copilot uses AI. Check for mistakes.
Comment thread crates/ruborist/src/resolver/git.rs Outdated
Comment on lines +290 to +303
let key = format!("{}#{}", url, commit_ish.unwrap_or("HEAD"));

let cell = {
let mut cache = GIT_CLONE_CACHE.lock();
cache
.entry(key)
.or_insert_with(|| Arc::new(tokio::sync::OnceCell::new()))
.clone()
};

let clone_url = url.strip_prefix("git+").unwrap_or(url).to_string();
let original_url = url.to_string();
let cache_dir = cache_dir.to_path_buf();
let commit_ish_owned = commit_ish.map(|s| s.to_string());
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clone dedup key is computed from the raw url (format!("{}#{}", url, ...)) but the actual clone uses clone_url with the git+ prefix stripped. This means callers using logically equivalent URLs (with vs without git+) won't dedup, and (more importantly) the key does not include cache_dir, so two callers using different cache roots in the same process can incorrectly share a cached result and receive a cache_path in the wrong directory. Consider canonicalizing the URL used for the key (strip git+, normalize) and including cache_dir (or otherwise scoping the cache per cache root).

Copilot uses AI. Check for mistakes.
Comment thread crates/ruborist/src/resolver/git.rs Outdated
Comment on lines +174 to +215
// Configure shallow fetch (depth 1)
prepare = prepare.with_shallow(gix::remote::fetch::Shallow::DepthAtRemote(
std::num::NonZeroU32::new(1).unwrap(),
));

// If commit_ish looks like a branch/tag, configure the ref to fetch
let is_full_sha =
commit_ish.is_some_and(|c| c.len() == 40 && c.chars().all(|ch| ch.is_ascii_hexdigit()));

if let Some(ref_name) = commit_ish
&& !is_full_sha
{
// Try as branch/tag ref
prepare = prepare.with_ref_name(Some(ref_name))?;
}

let (checkout, _outcome) = prepare
.fetch_only(gix::progress::Discard, &gix::interrupt::IS_INTERRUPTED)
.map_err(|e| {
let mut msg = format!("git fetch failed for '{}': {e}", clone_url);
if github_auth_token().is_none()
&& (clone_url.contains("github.com") || clone_url.contains("gitlab.com"))
{
msg.push_str(
"\n\nTip: set GITHUB_TOKEN or GH_TOKEN for private repos / higher rate limits",
);
}
anyhow!(msg)
})?;

// Resolve the commit
let commit_id = if is_full_sha {
let sha = commit_ish.unwrap();
gix::ObjectId::from_hex(sha.as_bytes())
.map_err(|e| anyhow!("invalid commit SHA '{}': {e}", sha))?
} else {
// HEAD of the fetched ref
checkout
.head_id()
.map_err(|e| anyhow!("could not resolve HEAD after fetch: {e}"))?
.detach()
};
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clone_repo_blocking always configures a shallow fetch with depth 1, but the code path for a full 40-hex commit_ish just constructs an ObjectId and then immediately tries to look it up in the repo. For non-HEAD commits this object typically won't be present in a depth-1 fetch, so resolving by full SHA will fail deterministically. Consider fetching the specific commit (or disabling shallow / increasing depth) when commit_ish is a full SHA, or otherwise documenting that only refs (branch/tag) are supported with shallow cloning.

Copilot uses AI. Check for mistakes.
Comment thread crates/ruborist/src/resolver/git.rs Outdated
Comment on lines +227 to +236
// Read name/version from package.json blob (cheap — single object read)
let (name, version) = read_pkg_name_version(&checkout, tree_id.into())?;

// Cache path: <cache_dir>/<name>/<commit_sha>/
// Using the full commit SHA as the version directory avoids collisions
// with registry versions and between different commits of the same repo.
// This also matches the layout registry tarballs use (<name>/<version>/),
// so `utoo clean` works uniformly.
let package_dir = cache_dir.join(&name).join(&commit_hex);
let resolved_marker = package_dir.join("_resolved");
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package_dir is derived from the repo's package.json name and joined directly onto cache_dir. Since this value comes from untrusted content, a malicious package name containing path traversal (e.g. .. segments or platform separators) could escape the intended cache root and write files outside cache_dir. Consider validating/sanitizing the package name (e.g. enforce npm package name rules and reject .., absolute paths, Windows prefixes) before using it as a filesystem path component.

Copilot uses AI. Check for mistakes.
@killagu killagu force-pushed the pr2/bfs-git-resolution branch from 42fcfa3 to e9b77e0 Compare February 25, 2026 02:39
@killagu killagu force-pushed the pr1/git-spec-parser branch from 29a4461 to f636c48 Compare February 25, 2026 02:39
@killagu killagu force-pushed the pr2/bfs-git-resolution branch from e9b77e0 to dfda143 Compare February 25, 2026 03:08
@killagu killagu force-pushed the pr1/git-spec-parser branch from f636c48 to 23c8e43 Compare February 25, 2026 03:08
@killagu killagu force-pushed the pr2/bfs-git-resolution branch from dfda143 to ad7962f Compare February 25, 2026 03:21
@killagu killagu force-pushed the pr1/git-spec-parser branch 2 times, most recently from 53c271e to 41d9077 Compare February 25, 2026 03:35
@killagu killagu force-pushed the pr2/bfs-git-resolution branch from ad7962f to 52e1b13 Compare February 25, 2026 03:35
@killagu killagu force-pushed the pr1/git-spec-parser branch 7 times, most recently from 3be048c to e07462e Compare February 25, 2026 12:54
Base automatically changed from pr1/git-spec-parser to next February 26, 2026 03:06
@xusd320 xusd320 self-requested a review February 26, 2026 03:31
@killagu
Copy link
Copy Markdown
Contributor Author

killagu commented Feb 26, 2026

Code Guard Review

审查维度:类型建模、命名语义、match 穷举性、废弃策略、错误消息质量、项目约定。


🔴 Must Fix (3 项)

1. process_dependency 中 enum 分类重复实现 — builder.rs

process_dependencymatches! 探针对 PackageSpec 变体做路由,而不是直接 exhaustive match。每新增一个 PackageSpec 变体(如 HttpTarball),编译器不会在路由处报错,只会悄悄走 registry 路径。

// 现在(有问题):
let resolved = if matches!(
    crate::model::spec::PackageSpec::parse(&edge_info.spec),
    crate::model::spec::PackageSpec::Git { .. }
        | crate::model::spec::PackageSpec::GitHub { .. }
) { ... } else { ... };

// 修改后 — parse 一次,exhaustive match:
let resolved: ResolvedPackage = match PackageSpec::parse(&edge_info.spec) {
    PackageSpec::Git { .. } | PackageSpec::GitHub { .. } => {
        resolve_non_registry_dep(cache_dir, &edge_info.name, &edge_info.spec).await
            .map_err(|e| ResolveError::Git {
                url: edge_info.spec.clone(),
                message: e.to_string(),
            })?
    }
    PackageSpec::Registry { .. } => {
        match resolve_dependency(registry, &edge_info.name, &edge_info.spec, &edge_info.edge_type).await? {
            Some(r) => r,
            None => return Ok(ProcessResult::Skipped),
        }
    }
    // 编译器强制穷举:新增变体 = 编译错误,而不是 runtime 静默误判
};

2. is_non_registry_spec 遗漏 HTTP tarball — preload.rs

is_non_registry_spec 遗漏了 https://*.tgz 格式,导致 HTTP tarball 依赖会被错误地发往 registry 预加载,产生 404。这是与 PackageSpec enum 独立维护两套分类逻辑的反模式(A1)。

建议:删除 is_non_registry_spec,所有调用处改为:

!matches!(PackageSpec::parse(spec), PackageSpec::Registry { .. })

编译器保证分类逻辑永远与 enum 同步,新增变体时不会遗漏。

3. 缓存写入存在 TOCTOU 竞态 — git.rs clone_repo_blocking

resolved_marker.exists() 检查和后续写入之间没有原子保证。两个并发进程 resolve 相同 commit 时,第二个进程会在第一个正在 extract 时删除并重建 package_dir,产生损坏的部分提取目录,且该目录通过 resolved_marker.exists() 检查被当作合法缓存复用。

// 修改后 — 写入临时目录,再原子 rename:
let tmp_dir = package_dir.with_extension("tmp");
if tmp_dir.exists() {
    std::fs::remove_dir_all(&tmp_dir)?;
}
std::fs::create_dir_all(&tmp_dir)?;
extract_tree_to_dir(&checkout, tree_id.into(), &tmp_dir)?;
std::fs::write(tmp_dir.join("_resolved"), "")?;
std::fs::rename(&tmp_dir, &package_dir)?; // POSIX 保证原子性

🟡 Should Fix (5 项)

4. github_auth_token() 调用两次 — git.rs

clone_repo_blocking 函数顶部调用一次用于注入 token,map_err closure 内又调用一次检查是否为 None,逻辑不一致且冗余。

// 修改后:
let token = github_auth_token(); // 捕获一次
let effective_url = match token {
    Some(ref t) => inject_token_into_url(clone_url, t),
    None => clone_url.to_string(),
};
// map_err 中复用:
if token.is_none() && (clone_url.contains("github.com") || ...) { ... }

5. NonZeroU32::new(1).unwrap()git.rs

项目使用 nightly toolchain,.unwrap() 此处无必要,建议改用:

// Option A:语义最清晰
NonZeroU32::MIN  // 值保证为 1,stable 1.70+

// Option B:const 上下文,编译期验证
const DEPTH_ONE: NonZeroU32 = NonZeroU32::new(1).unwrap();

6. with_cache_dir(Option<PathBuf>) 参数类型 — builder.rs

项目 builder 惯例是接受值类型,调用方不应手写 Some(...)

// 修改后:
pub fn with_cache_dir(mut self, cache_dir: PathBuf) -> Self {
    self.cache_dir = Some(cache_dir);
    self
}
// 调用方:builder.with_cache_dir(path)  而非  builder.with_cache_dir(Some(path))

7. ResolveError::Git 丢失错误链 — registry.rs

message: String 导致 source() 返回 Noneanyhow 和诊断工具无法回溯原始 gix 错误:

// 修改后:
ResolveError::Git { url: String, source: anyhow::Error }

// Display:
ResolveError::Git { url, source } => write!(f, "Git resolution failed for {}: {}", url, source)

// source():
ResolveError::Git { source, .. } => source.source()

8. is_local_spec 未删除,仍在 builder.rs 中使用 — preload.rs

preload.rsis_local_spec 加了 #[deprecated],但 builder.rs 仍在 import 并调用它。内部函数应直接删除并更新所有调用处,不应走废弃路径。


⚪ Style Suggestion (2 项)

9. 错误消息引号不一致 — registry.rs

现有变体无引号,新增 Git 变体的 URL 有单引号,建议保持一致:

// 现在: "Git resolution failed for '{}': {}"
// 建议: "Git resolution failed for {}: {}"

10. submodule 条目静默跳过缺少 trace — git.rs

gix::object::tree::EntryKind::Commit => {
    // 建议添加:
    tracing::debug!("Skipping submodule entry at {:?} (submodules are not resolved)", entry.filename());
}

核心总结

三个 Must Fix 共享同一根因:PackageSpec enum 被引入作为 spec 分类的唯一真相来源,但 process_dependency 路由和 is_non_registry_spec 各自独立重新实现了子集分类逻辑。统一修复方案是删除所有独立分类器,所有调用处改为 exhaustive match PackageSpec::parse(spec) { ... },让编译器强制穷举性保证,避免新增变体时出现静默误判。

🤖 Generated with Claude Code

@killagu killagu force-pushed the pr2/bfs-git-resolution branch 2 times, most recently from af81f2d to 9ff74f3 Compare February 26, 2026 05:52
@killagu killagu changed the title feat(ruborist): add native-git BFS resolution with OnceMap dedup (2/3) feat(pm): add native-git BFS resolution (2/3) Feb 26, 2026
@killagu killagu force-pushed the pr2/bfs-git-resolution branch from 9ff74f3 to ab5c43d Compare February 26, 2026 05:53
Comment thread crates/ruborist/src/model/git.rs Outdated
/// re-reading the file from disk.
///
/// [`VersionManifest`]: crate::model::manifest::VersionManifest
pub pkg_json: serde_json::Value,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

禁止出现 serde_json::value,直接使用 versionManifest
同时需要注意 package.json 和 versionManifest 的区别

Comment thread crates/ruborist/src/model/spec.rs
Comment thread crates/ruborist/src/resolver/builder.rs Outdated
let parsed_spec = PackageSpec::from(edge_info.spec.as_str());
let resolved: ResolvedPackage = match &parsed_spec {
PackageSpec::Git { .. } | PackageSpec::GitHub { .. } => {
match resolve_non_registry_dep(config.cache_dir.as_deref(), &parsed_spec).await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_non_registry_dep 内未来还需要实现 sepc => version 过期判断

目前 resolve_dependency 的命名不够语义化,没有体现缓存功能

}
}
}
PackageSpec::Local { .. } => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workspace: 目前已经支持了,需要考虑

Comment thread crates/ruborist/src/resolver/git.rs Outdated
///
/// Only reads the single `package.json` blob — does **not** extract the full
/// tree, so this is cheap even for large repositories.
fn read_pkg_json(repo: &gix::Repository, tree_id: gix::ObjectId) -> Result<serde_json::Value> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

禁止 serde_jons::Value 😈

Comment thread crates/ruborist/src/resolver/git.rs
@@ -0,0 +1,547 @@
//! Git clone backend powered by `gix`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git.rs 目前都是同步方法,目前 pipeline 模式都是异步运行时,看看有 async 的解决方案吗?

Comment thread crates/ruborist/src/resolver/mod.rs
Comment thread crates/ruborist/src/resolver/registry.rs
/// 2. Clones the repository via [`ensure_repo_cached`].
/// 3. Builds a [`VersionManifest`] from the cached `package.json` (no extra
/// disk I/O — the parsed JSON is already in [`GitCloneResult::pkg_json`]).
pub(crate) async fn resolve_non_registry_dep(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) async fn resolve_non_registry_dep(
// TODO: refactor this to be more extendable
pub(crate) async fn resolve_non_registry_dep(

@killagu killagu force-pushed the pr2/bfs-git-resolution branch 3 times, most recently from 46218c2 to d5364c7 Compare February 26, 2026 07:49
- Add gix/tempfile as optional deps behind `native-git` feature flag
- Add resolver/git.rs with clone_repo (async), clone_repo_blocking,
  resolve_non_registry_dep, build_manifest_from_git_cache
- Route non-registry specs through git resolver in BFS process_dependency
- Add GIT_CLONE_CACHE with tokio::sync::OnceCell for concurrent dedup
- Use tokio::fs for non-blocking package.json reads (arch review fix)
- Add BuildDepsConfig.cache_dir with builder pattern
- Update ResolveError::Git to structured { url, message } fields
- Keep BuildDepsOptions::new() (no public API removal)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@killagu killagu force-pushed the pr2/bfs-git-resolution branch from d5364c7 to 0120a57 Compare February 27, 2026 04:01
Comment thread crates/ruborist/src/resolver/git.rs Outdated
url: &str,
commit_ish: Option<&str>,
) -> Result<Arc<GitCloneResult>> {
// Normalize the dedup key: strip `git+` so `git+https://…` and `https://…`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clone_repo_blocking 每次都走网络,不读磁盘缓存

                                                                                                                                                                                                                                          
  fn clone_repo_blocking(...) -> Result<GitCloneResult> {                                                                                                                                                                                  
      let temp_dir = tempfile::tempdir()?;        // ← tempdir 创建                                                                                                                                                                        
      let (checkout, _outcome) = prepare.fetch_only(...)?;  // ← 网络 fetch 总是执行                                                                                                                                                       
                                                                                                                                                                                                                                           
      // ... 计算 commit_hex, package_dir, resolved_marker ...                                                                                                                                                                             
                                                                                                                                                                                                                                           
      if resolved_marker.exists() {              // ← 检查缓存在 fetch 之后!                                                                                                                                                              
          return Ok(GitCloneResult { ... });                                                                                                                                                                                               
      }
      // ... extract ...                                                                                                                                                                                                                   
  }                                                                                                                                                                                                                                        
     

_resolved 标记用来做进程间原子性保证,但它是在完成网络 fetch 之后才检查的。这意味着每次 utoo install 运行,即使包已经在磁盘缓存里,都会重新 clone。

缓存路径在 fetch 前就完全确定(cache_dir///),可以提前检查。修复思路:

  // 仅对全 SHA 可提前检查(branch/tag refs 必须 fetch 才能知道当前 HEAD)
  if let Some(sha) = commit_ish.filter(|s| s.len() == 40 && s.chars().all(|c| c.is_ascii_hexdigit())) {
      // 但 name 未知……需要从磁盘 package.json 读取
      // 或者维护一个 url->name 的索引文件
  }

killagu and others added 2 commits February 27, 2026 21:58
Scan the disk cache before any network I/O when commit_ish is a full
40-char hex SHA. This avoids re-cloning git dependencies on every
`utoo install` when they are already cached at `<cache_dir>/<name>/<sha>/`.

Branch/tag refs still always fetch (the ref may have moved forward).
On cache-read failure the code falls through to the normal network path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Git { url: String, message: String },
Git {
url: String,
source: Box<dyn std::error::Error + Send + Sync + 'static>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolveError 其他 variant 用的是具体类型(deno_semver::VersionError、E),唯独 Git 用了 trait object。这破坏了 enum 的类型安全优势——Comprehensive Rust 的 Error trait 章节说过
"boxing errors gives up the ability to cleanly handle different error cases"。

建议: 用 #[from] 配合具体的 GitResolveError 类型,或至少用 anyhow::Error 替代裸 Box 以保持一致。

Comment thread crates/ruborist/src/resolver/git.rs Outdated
/// same repo simultaneously, only one clone is performed; the others await
/// the result. Matches the `DOWNLOAD_CACHE` pattern in the PM crate's
/// `downloader.rs`.
type CloneCache = Mutex<HashMap<String, Arc<tokio::sync::OnceCell<Arc<GitCloneResult>>>>>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type CloneCache = Mutex<HashMap<String, Arc<tokio::sync::OnceCell<Arc>>>>;
static GIT_CLONE_CACHE: LazyLock = LazyLock::new(|| Mutex::new(HashMap::new()));

Comprehensive Rust 并发章节强调在 async 上下文中要注意 mutex 的选择。这里用 parking_lot::Mutex(同步锁)来保护一个在 async 函数中使用的 HashMap。虽然锁持有时间很短(只是
entry().or_insert_with()),理论上不会 block executor,但有两个问题:

  • 全局 static 无法被 test/clean/drop:进程生命周期内缓存永远增长,utoo clean 只清磁盘但内存 map 不清。
  • 跨 test 泄漏:#[tokio::test] 之间共享全局状态。

建议: 将 CloneCache 作为 BuildDepsConfig 或 resolver context 的一个字段(Arc),而不是 static。

可以参照一下目前 VersionManifestCache 相关实现

Comment thread crates/ruborist/src/resolver/git.rs Outdated
/// `<cache_dir>/@<scope>/<name>/<sha>/_resolved` for scoped packages.
/// Returns the package directory path if found.
fn find_cached_git_package(cache_dir: &Path, sha: &str) -> Option<PathBuf> {
let entries = std::fs::read_dir(cache_dir).ok()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为了找某个 SHA 的缓存目录,遍历了 cache_dir 下所有顶层目录,包括 scoped package 的二级目录。如果缓存目录有大量包,这会很慢。

其实可以直接读取 name 后判断,因为我们虽然不知道版本号,但是 name 是确定的

@elrrrrrrr
Copy link
Copy Markdown
Contributor

还需要审视一下目前异常处理、全局缓存方案,结构要保持一致。

  1. 不能存在 cache_dir 目录下全文件扫描,由于在同步方法的上下文内,获取 hash 成本会很高
  2. 注意不用 mutex、box dyn error 等方式,架构需要 ruborist 和其他 resolver.rs 保持一致

killagu and others added 2 commits February 28, 2026 00:22
…yle cleanup

- Delete directory scan (find_cached_git_package), use direct path check
  with caller-supplied name instead
- Move global static GIT_CLONE_CACHE to BuildDepsConfig field (async
  tokio::sync::Mutex, scoped to session lifetime)
- ResolveError::Git source: Box<dyn Error> → anyhow::Error
- Fix symlink escape: validate targets stay within extraction root
- Fix path traversal: validate name before early cache lookup
- Fix token injection: skip URLs with existing credentials
- Fix staging dir race: use tempfile::tempdir_in() for unique dirs
- Add PackageSpec::clone_url() to centralize git+ prefix stripping
- Add GitCloneResult::new() constructor with debug_assert consistency
- Unify tracing levels (warn → debug), divider style, doc headers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ResolveError::Git.source is now anyhow::Error, so the conversion
from anyhow::Error is identity — clippy::useless_conversion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@killagu killagu force-pushed the pr2/bfs-git-resolution branch from 715dfb4 to 4f6105d Compare March 5, 2026 13:30
Resolve conflicts:
- Cargo.toml: add gix/tempfile workspace deps with next's formatting
- crates/ruborist/Cargo.toml: keep native-only deps, fix tombi format
- crates/ruborist/src/resolver/builder.rs: reconcile next's refactored
  manifest deref with PR's config.legacy_peer_deps, add missing HashMap
  import
- Cargo.lock: preserve existing dependency versions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@killagu killagu force-pushed the pr2/bfs-git-resolution branch from 4f6105d to a2df188 Compare March 5, 2026 13:44
@elrrrrrrr elrrrrrrr merged commit 8cd5311 into next Mar 5, 2026
23 checks passed
@elrrrrrrr elrrrrrrr deleted the pr2/bfs-git-resolution branch March 5, 2026 14:17
killagu added a commit that referenced this pull request Mar 8, 2026
…dback

Address review comments from PR #2622/#2623:

Phase 1 (correctness):
- Add Local/Http variants to PackageSpec, unify is_non_registry_spec via parse delegation
- Convert process_dependency to exhaustive match on PackageSpec variants
- Fix TOCTOU race in cache writes with atomic tmp-dir + rename
- Replace serde_json::Value with typed PkgNameVersion and direct VersionManifest deserialization

Phase 2 (code quality):
- Add thiserror derive for ResolveError, preserve error chain in Git variant
- Convert GIT_CLONE_CACHE static to CloneCache instance variable in BuildDepsConfig
- Consolidate cfg(native-git) checks inside git.rs, remove stubs from builder.rs
- Move std::env reads out of ruborist into PM's git_resolver.rs (auth_token parameter)
- Small fixes: NonZeroU32::MIN, with_cache_dir(PathBuf), cache_path→store_path,
  submodule skip trace, error message quote consistency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants